On 8/30/06, Steve Summit scs@eskimo.com wrote:
If it's possible to guarantee it gets kept, it's possible to guarantee it only gets kept for a day.
False (unless you're splitting hairs).
// If you remove the line after the next or refactor this code, we will // flay the living flesh from your bones $db->write("$IP visited this page, yay") $db->check_if_stuff_is_over_a_day_old_and_deal_with_it(); // If you remove the above line or refactor this code, we will flay the // living flesh from your bones
Okay, but that's true only as long as (a) the stats factor is in the thousands,
No, it's true as long as it's above one. Even if it's just two, someone making two page views would have a 75% chance of getting one hit through, instead of a 50% chance: a major difference.
(2) nobody's trying to deliberately skew the results.
If anybody is, we're screwed anyway if we're doing sampling.
But also, it only *matters* if you're trying to keep (not discard) the extra hits, i.e. if you do want to say something like "M people viewed it N times" as opposed to "M people viewed it at least once".
Um, this entire discussion is about the latter.
If you're interested in discarding redundant hits, it obviously doesn't matter whether the browser or the server does it.
Except that the server can't do it.
On 8/30/06, Gregory Maxwell gmaxwell@gmail.com wrote:
H(secret + ip) can only be inverted by exhaustive search of both the secret and the IP (or the secret if you happen to have some known H(), IP pairs)... and the secret can be much longer than 32 bits.
Except that presumably anyone with access to the actual encoded IPs will have access to the secret as well, yes? Or are we talking about letting *anyone* see the encoded IP-pageview correlations? In which case, that is kind of a privacy violation, in the AOL style.
(You could always change the secret, of course . . . first check if H(secret(1) + ip) exists, and if it does, use H(secret(2) + ip) instead if that doesn't exist, and so forth . . . but then there's no point in making it public, and we're back to the "anyone who knows the encoded IPs knows the secret anyway" thing.)