On Thu, Jan 15, 2009 at 11:00 AM, Eugenio Tacchini <eugenio(a)favoriti.it>wrote;wrote:
Hello everybody,
I'm looking, for academic research purposes, for the "status" of
wikipedia pages. For "status" I mean:
- stub
- normal
- good article
- featured
Is there any coloumn in the mediawiki database schema that can give
me this information?
Thanks in advance.
Cheers,
Eugenio
None of this information is stored in the database really. A count of
real articles vs. stubs is sort of stored in the site_stats. The content
pages that aren't stubs are counted in ss_good_articles. However,
ss_total_pages is all pages, content or not; making this a bad metric
for your purposes. An individual wiki's concept of stub/normal/good/
featured is a completely arbitrary system not based in the actual
software in any way. The only idea I'd have (for English Wikipedia)
would be cross-referencing to see which articles contain the {{stub}}
(or similar) template, as that should give you a good idea of stubs.
Perhaps similar things could be done with {{featured}}?
The only other idea I'd have would be to check the FlaggedRevs tables
to see how they describe individual articles. However, the English
Wikipedia doesn't use the extension yet, and I don't think this information
is included in the dumps, iirc.
-Chad