I think it's unlikely to significantly skew the results. A few extra hits compared to the thousands received on popular pages isn't an issue. I even tried to artifically inflate the results for one page and had no luck whatsoever :) The only concern would be if certain "types" of pages encouraged rapid refreshing, like if for some reason pokemon pages were refreshed much faster than normal pages, they would be over-reported. But if it's just individual random editors who skew the results of whatever page they edit, there should be no overall bias.
Steve
On 8/30/06, Steve Summit scs@eskimo.com wrote:
One significant potential source of error in Leon's (marvelous!) new hitcount stats is the possibility that one reader is for whatever reason fetching the same page multiple times (perhaps due to nothing more than a prolonged edit).
Obviously it would be best to filter out multiple fetches of the same page from the same IP address over some interval, probably one day. (Yes, this could then undercount hits from behind NAT firewalls and proxies, but I think it'd still be worth it overall.)
I know that Leon's scheme is currently not logging IP addresses, and given AOL's recent high-profile screwup I have to agree that not logging IP addresses in this context is probably a good idea. But what if we logged a one-way hash of the IP address, that couldn't be correlated with anything else?
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l