Cormac wrote:
For July 13th, under the columns C and D, I have the numbers 6,884 and 798 respectively. Below this then are two much higher figures (11,285 and 1,629) - are these the maximum estimates and the above published figure the conservative estimate, or how does it work exactly?
Cormac, the July 13th figures are actual counts. The much higher July figures below it are forecasts for the complete month. Hence the +/- sign.
These forecasts are based on looking at previous three months and calculating the proportion of wikipedians that fulfilled the criteria on day x, versus the number that did so at the end of the month.
The resulting multiplication factors for all wikipedias together are combined into a weighted average, to minimize distorting effects of peaks in activity in just one wikipedia in previous months.
This is better than just multiplying the actual counts for the 13th by 31/13 to arrive at full months forecasts.
Especially with columns C and D, as the increase in wikipedians that fulfill the norm for C or D is highly unlinear over a month.
This is also why for C: 6884/11285 > 0.5 and for for D: 798/1629 < 0.5 In words: more than half of the wikipedians that would count as active at the end of a month already made the grade at the 13th, most wikipedians who finally qualify for very active only do so later in the month.
When wikistats is run on 6th of month or earlier no forecasts are given as margin of error would be too high.
Erik Zachte
On 8/31/05, epzachte@chello.nl epzachte@chello.nl wrote:
Cormac wrote:
For July 13th, under the columns C and D, I have the numbers 6,884 and 798 respectively. Below this then are two much higher figures (11,285 and 1,629) - are these the maximum estimates and the above published figure the conservative estimate, or how does it work exactly?
Cormac, the July 13th figures are actual counts. The much higher July figures below it are forecasts for the complete month. Hence the +/- sign.
Thanks, Erik.
What's the margin of error here? Is it significant?
These stats never cease to amaze me, by the way :-)
Cormac
Cormac:
What's the margin of error here? Is it significant?
Cormac, I have no exact figures in terms of alphas and standard deviations. The error greatly depends on the day of the month and the size of a wikipedia.
The >average< forecast will become more accurate very fast, as the month progresses, because 200 wikipedias contribute to the figure. But each individual forecast may be wildly inaccurate early in the month.
TV coverage may double the activity in one wikipedia later in the month. Global media attention (e.g. for Wikimania) will make all estimates too low. Small wikipedias will have large differences in activity per month anyway.
I see it as a best guess only so that one can compare figures for the current month with earlier months. Without this forecast people would intuitively use the 31/13 method for Aug 13th, which will be much farther from the truth for columns C and D.
Cheers, Erik
wikimedia-l@lists.wikimedia.org