On 8/30/06, Steve Summit scs@eskimo.com wrote:
One significant potential source of error in Leon's (marvelous!) new hitcount stats is the possibility that one reader is for whatever reason fetching the same page multiple times (perhaps due to nothing more than a prolonged edit).
[snip]
No.
The tool measures estimated views rather than unique impressions, thus this is not an error. Someone sitting around hitting reload over and over again is additional views, thus such actions should be sampled just as often as any other view.
The major source of error in this is that we are sampling at far too low a rate. We believe that page view distribution for wikipedia is power law. As such, course sampling will only be able to tell us useful data about the relative ranking of the most popular items, so it's good that the web interface only displays the top 1000 at most... However, with only 34,000 samples collected of enwiki viewing over four days much of the samples are scattered randomly around pages deep into the tail which we can't not speak about accurately.
Changes will be made so that we can substantially increase the sampling rate.
I pity the journalist who sees our data and runs an absolutely idiotic story based on it.