Hi,
since the site stats are conveniently stored in the site_stats table, I suggest subtracting the number of articles created by the Ram-Man bot (US Census city information) from the total number of articles.
Why? The NOA is primarily interesting as a measure of our collaborative progress. This is important for ourselves and for others. Personally, I've had several discussions about Wikipedia where I was reluctant to cite the NOA because of the high number of machine-generated articles, others probably feel the same.
I therefore believe we should generally exclude autogenerated articles (we can change the wording on Main_Page to reflect this). As it would be a 5 minute task for anyone with access to the db, is there any reason not to do it?
Regards,
Erik
Erik Moeller wrote:
Why? The NOA is primarily interesting as a measure of our collaborative progress. This is important for ourselves and for others. Personally, I've had several discussions about Wikipedia where I was reluctant to cite the NOA because of the high number of machine-generated articles, others probably feel the same.
I therefore believe we should generally exclude autogenerated articles (we can change the wording on Main_Page to reflect this). As it would be a 5 minute task for anyone with access to the db, is there any reason not to do it?
The NOA is a highly unreliable figure for any purpose, and there has been much discussion on reforming it.
If you think collaboration is the key, then a more general solution is in order: only count article-space pages that have been edited at least twice (and thus, have an old revision stored in the 'old' table).
-- brion vibber (brion @ pobox.com)
Why? The NOA is primarily interesting as a measure of our collaborative progress. This is important for ourselves and for others. Personally, I've had several discussions about Wikipedia where I was reluctant to cite the NOA because of the high number of machine-generated articles, others probably feel the same.
I therefore believe we should generally exclude autogenerated articles (we can change the wording on Main_Page to reflect this). As it would be a 5 minute task for anyone with access to the db, is there any reason not to do it?
The NOA is a highly unreliable figure for any purpose, and there has been much discussion on reforming it.
If you think collaboration is the key, then a more general solution is in order: only count article-space pages that have been edited at least twice (and thus, have an old revision stored in the 'old' table).
I'm in support of this idea, but I don't know how many others are. Just cutting out the bots might be the least controversial, and since the current modus operandi is to try to make decisions by consensus (I prefer voting) ..
Regards,
Erik
wikitech-l@lists.wikimedia.org