O. O. wrote:
I am looking at the dump of the English Wikipedia at
http://download.wikimedia.org/enwiki/20081008/ There is a file called
“all-titles-in-ns0.gz” which is supposed to contain the List of Page
Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in
“pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
So why are these two numbers different? Are there pages without a Title?
The description of pages-articles.xml.bz2 says "Articles, templates,
image descriptions, and primary meta-pages." Presumably the 1932231
non-article pages in it are the "templates, image descriptions, and
primary meta-pages".
--
Ilmari Karonen