[Foundation-l] Classifying what is on Wikipedia

Peter Damian peter.damian at btinternet.com
Mon Sep 20 19:19:27 UTC 2010


Following on from my previous posts about trying to classify the scope and 
coverage of humanities subjects in Wikipedia, I have a practical question: 
is it possible to query the Wikipedia database in such a way as to get a 
list of all articles (current version)?  Even better, with a second, larger 
list that indexes each article with a list of categories it belongs to. 
Example

List 1

Name , ID
Thomas Aquinas, 1
William of Ockham, 2

List 2

ID, category
1, 1225 births
1, 1274 deaths
[...]
2, 1285 births
2, 1347 deaths
2, 13th century philosophers

and so on.  I appreciate the second list may be up to 20 times the size of 
the first, thus 60 million rows.  Perhaps there is a way to limit the number 
of categories, I don't know.

This would allow me to see exactly what was there under the humanities.  My 
hunch is that most articles in Wikipedia are obscure stubs (from using the 
random article function), and that the coverage of humanities subjects, 
possibly other areas, is actually no different to a conventional 
encyclopedia. 




More information about the foundation-l mailing list