O. O. schrieb:
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 2:44 PM, O. O. olson_ot@yahoo.com wrote:
Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
The description for pages-articles.xml.bz2 says it contains "Articles, templates, image descriptions, and primary meta-pages." all-titles-in-ns0.gz contains (as the name suggests) only the titles in ns0, i.e., the main namespace, articles. It does not contain templates, image descriptions, or "primary meta-pages" (whatever those are).
Thanks Ilmari and Aryeh.
I am not sure what are “primary meta-pages” – however “templates”, and “image descriptions” do have Titles. You can check this in the online version of the English Wikipedia.
Sure they have titles. But they are not "ns0" and thus not contained in this list. Wich is ns0 only (that is, main "article" namespace).
-- daniel