Daniel Kinzler wrote:
O. O. schrieb:
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 2:44 PM, O. O. olson_ot@yahoo.com wrote:
Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
The description for pages-articles.xml.bz2 says it contains "Articles, templates, image descriptions, and primary meta-pages." all-titles-in-ns0.gz contains (as the name suggests) only the titles in ns0, i.e., the main namespace, articles. It does not contain templates, image descriptions, or "primary meta-pages" (whatever those are).
Thanks Ilmari and Aryeh.
I am not sure what are “primary meta-pages” – however “templates”, and “image descriptions” do have Titles. You can check this in the online version of the English Wikipedia.
Sure they have titles. But they are not "ns0" and thus not contained in this list. Wich is ns0 only (that is, main "article" namespace).
-- daniel
Thanks Daniel. I had not understood the meaning of NS0. Anyway I found the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0 However this confuses me even more.
The above link says that “only articles” and no redirects are in the namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
Thanks for helping out, O. O.