On Mon, Apr 26, 2010 at 5:52 PM, Platonides <Platonides@gmail.com> wrote:
Anthony wrote:
> What kind of space needs are we talking about?

100k requests per second.
Assuming that an url is 50 bytes on average, that's 432 GB per day (the
usual apache log line is about 1.5 times that).

Seems reasonable.  For 3 days of access that's 18 gigs per server over 70 servers.

And that's without compression, and 50 bytes seems awfully long for a URL.
 
>     What if your referer was your facebook personal page leaking your full
>     real name?
>
> And what if you're in the sample?  I find it quite inappropriate that
> even sampled data like this is being released.

The referer is not stored anywhere.

Well, that's good to hear.  What exactly is contained in the sampled data which is being released?  We've heard what's in the 1/10th sample Mr. Priedhorsky is getting, but what about the rest?