On Thu, Jan 15, 2009 at 11:00 AM, Eugenio Tacchini eugenio@favoriti.itwrote:
Hello everybody, I'm looking, for academic research purposes, for the "status" of wikipedia pages. For "status" I mean:
- stub
- normal
- good article
- featured
Is there any coloumn in the mediawiki database schema that can give me this information?
Thanks in advance.
Cheers,
Eugenio
None of this information is stored in the database really. A count of real articles vs. stubs is sort of stored in the site_stats. The content pages that aren't stubs are counted in ss_good_articles. However, ss_total_pages is all pages, content or not; making this a bad metric for your purposes. An individual wiki's concept of stub/normal/good/ featured is a completely arbitrary system not based in the actual software in any way. The only idea I'd have (for English Wikipedia) would be cross-referencing to see which articles contain the {{stub}} (or similar) template, as that should give you a good idea of stubs. Perhaps similar things could be done with {{featured}}?
The only other idea I'd have would be to check the FlaggedRevs tables to see how they describe individual articles. However, the English Wikipedia doesn't use the extension yet, and I don't think this information is included in the dumps, iirc.
-Chad