I got a rough estimate of when the millionth article was added:
If the 109 cur/old dumps from Sep 20 would tell the whole story it would have been at Sep 14 at about 21:55 hrs (server time).
Of course articles were deleted between Sep 14 and Sep 20 so the result would have been different with dumps from an earlier date. In fact it would bring the time forward.
At the other hand some articles may not have matched the criterium of 'no redirect' and 'at least one internal link' on Sep 14, while they did at Sep 20.
I collected all articles in the 'cur' database that fulfilled the criteria at the time of the dump, then found the time they were added either in 'cur' or 'old'.
Duplicate articles were ignored (they happen sometimes after clicking 'save' twice within seconds), also articles that exist only in old were not counted (a few hundred, they may partially be explained by different dump times for cur and old, with deletions coming in between, but probably mostly are just aborted transactions).
As said I processed the 109 languages that are currently in the weekly stats job, so I might have missed a few entries in very recent startup languages that are still looking forward to their first milesone of 10 articles :)
Given all the above, I think it is not useful to point to one specific article as the millionth, as too much noise is available.
Erik Zachte
Erik Zachte wrote:
I got a rough estimate of when the millionth article was added:
If the 109 cur/old dumps from Sep 20 would tell the whole story it would have been at Sep 14 at about 21:55 hrs (server time).
Of course articles were deleted between Sep 14 and Sep 20 so the result would have been different with dumps from an earlier date. In fact it would bring the time forward.
At the other hand some articles may not have matched the criterium of 'no redirect' and 'at least one internal link' on Sep 14, while they did at Sep 20.
I collected all articles in the 'cur' database that fulfilled the criteria at the time of the dump, then found the time they were added either in 'cur' or 'old'.
Duplicate articles were ignored (they happen sometimes after clicking 'save' twice within seconds), also articles that exist only in old were not counted (a few hundred, they may partially be explained by different dump times for cur and old, with deletions coming in between, but probably mostly are just aborted transactions).
As said I processed the 109 languages that are currently in the weekly stats job, so I might have missed a few entries in very recent startup languages that are still looking forward to their first milesone of 10 articles :)
Given all the above, I think it is not useful to point to one specific article as the millionth, as too much noise is available.
Of course it's impossible. And when the brass from McDonald's descends on a small town outlet to suddenly confer upon some unsuspecting customer the award for haing bought the umpteen billionth hamburger, do you really think that they made a precise calculation? It's all about publicity. You look at whoever wrote the first new article after 21:55 on Sept. 14; send him a T-shirt and mousepad with the logo, and make a big fuss about it.
Ec
With reference to these emails:
http://mail.wikipedia.org/pipermail/wikitech-l/2004-September/ 025479.html http://mail.wikipedia.org/pipermail/wikitech-l/2004-September/ 025480.html http://mail.wikipedia.org/pipermail/wikitech-l/2004-September/ 025483.html
I believe there is no need for embarrassment over the inaccuracy -- or for hiding these unavoidable shortcomings: Let's just go out and clearly admit that it's inaccurate for such-and-such reasons, but this is or best bet and thus the ordained "winner". That doesn't stop us from making a big fuss about it, and yes, I think it would be nice to give a merchandize reward to and have a photo-op with the contributor in question -- and then issue a brief(!) follow-up press release (where we also remain honest about the inherent limitations).
-- ropers [[en:User:Ropers]] www.ropersonline.com
wikitech-l@lists.wikimedia.org