Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
So why are these two numbers different? Are there pages without a Title?
Thanks a lot, O. O.
O. O. wrote:
I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
So why are these two numbers different? Are there pages without a Title?
The description of pages-articles.xml.bz2 says "Articles, templates, image descriptions, and primary meta-pages." Presumably the 1932231 non-article pages in it are the "templates, image descriptions, and primary meta-pages".
On Fri, Mar 13, 2009 at 2:44 PM, O. O. olson_ot@yahoo.com wrote:
Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
The description for pages-articles.xml.bz2 says it contains "Articles, templates, image descriptions, and primary meta-pages." all-titles-in-ns0.gz contains (as the name suggests) only the titles in ns0, i.e., the main namespace, articles. It does not contain templates, image descriptions, or "primary meta-pages" (whatever those are).
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 2:44 PM, O. O. olson_ot@yahoo.com wrote:
Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
The description for pages-articles.xml.bz2 says it contains "Articles, templates, image descriptions, and primary meta-pages." all-titles-in-ns0.gz contains (as the name suggests) only the titles in ns0, i.e., the main namespace, articles. It does not contain templates, image descriptions, or "primary meta-pages" (whatever those are).
Thanks Ilmari and Aryeh.
I am not sure what are “primary meta-pages” – however “templates”, and “image descriptions” do have Titles. You can check this in the online version of the English Wikipedia.
O. O.
O. O. schrieb:
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 2:44 PM, O. O. olson_ot@yahoo.com wrote:
Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
The description for pages-articles.xml.bz2 says it contains "Articles, templates, image descriptions, and primary meta-pages." all-titles-in-ns0.gz contains (as the name suggests) only the titles in ns0, i.e., the main namespace, articles. It does not contain templates, image descriptions, or "primary meta-pages" (whatever those are).
Thanks Ilmari and Aryeh.
I am not sure what are “primary meta-pages” – however “templates”, and “image descriptions” do have Titles. You can check this in the online version of the English Wikipedia.
Sure they have titles. But they are not "ns0" and thus not contained in this list. Wich is ns0 only (that is, main "article" namespace).
-- daniel
Daniel Kinzler wrote:
O. O. schrieb:
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 2:44 PM, O. O. olson_ot@yahoo.com wrote:
Hi, I am looking at the dump of the English Wikipedia at http://download.wikimedia.org/enwiki/20081008/ There is a file called “all-titles-in-ns0.gz” which is supposed to contain the List of Page Titles. If I do
cat enwiki-20081008-all-titles-in-ns0 | wc -l
I get 5716820. On the same page, a little above in “pages-articles.xml.bz2” we have “enwiki 7649051 pages”.
The description for pages-articles.xml.bz2 says it contains "Articles, templates, image descriptions, and primary meta-pages." all-titles-in-ns0.gz contains (as the name suggests) only the titles in ns0, i.e., the main namespace, articles. It does not contain templates, image descriptions, or "primary meta-pages" (whatever those are).
Thanks Ilmari and Aryeh.
I am not sure what are “primary meta-pages” – however “templates”, and “image descriptions” do have Titles. You can check this in the online version of the English Wikipedia.
Sure they have titles. But they are not "ns0" and thus not contained in this list. Wich is ns0 only (that is, main "article" namespace).
-- daniel
Thanks Daniel. I had not understood the meaning of NS0. Anyway I found the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0 However this confuses me even more.
The above link says that “only articles” and no redirects are in the namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
Thanks for helping out, O. O.
On Sat, Mar 14, 2009 at 9:26 AM, O. O. olson_ot@yahoo.com wrote:
The above link says that “only articles” and no redirects are in the namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
The larger number includes redirects, the smaller number doesn't.
Andrew Garrett wrote:
On Sat, Mar 14, 2009 at 9:26 AM, O. O. olson_ot@yahoo.com wrote:
The above link says that “only articles” and no redirects are in the
namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
The larger number includes redirects, the smaller number doesn't.
Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that “Redirects” are not considered as Articles and hence are not in NS0?
O.O.
On Sat, Mar 14, 2009 at 9:34 AM, O. O. olson_ot@yahoo.com wrote:
Andrew Garrett wrote:
On Sat, Mar 14, 2009 at 9:26 AM, O. O. olson_ot@yahoo.com wrote:
The above link says that “only articles” and no redirects are in the namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
The larger number includes redirects, the smaller number doesn't.
Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that “Redirects” are not considered as Articles and hence are not in NS0?
It doesn't say that, it says "Not all pages in the article namespace are considered to be articles", listing redirects as an example.
Andrew Garrett schrieb:
On Sat, Mar 14, 2009 at 9:34 AM, O. O. olson_ot@yahoo.com wrote:
Andrew Garrett wrote:
On Sat, Mar 14, 2009 at 9:26 AM, O. O. olson_ot@yahoo.com wrote:
The above link says that “only articles” and no redirects are in the
namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
The larger number includes redirects, the smaller number doesn't.
Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that “Redirects” are not considered as Articles and hence are not in NS0?
It doesn't say that, it says "Not all pages in the article namespace are considered to be articles", listing redirects as an example.
The terminology is indeed confusing. ns0 is the "main" namespace, which is used for "articles". But it also contains redirects. For the statistics, the software tries to count "real" or "good" articles, which is defined to be in ns0, not a redirect, and containing at least one link. It may in the future even be redefined not to include disambiguation pages. The title list however contains all pages in ns0.
Talk pages are in their own namesapace, or rather, namespaces. Namespaces come in pairs: the namespace itself (even id), and the corresponding talk namespace (odd id).
-- daniel
On Sat, Mar 14, 2009 at 8:46 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Andrew Garrett schrieb:
On Sat, Mar 14, 2009 at 9:34 AM, O. O. olson_ot@yahoo.com wrote:
Andrew Garrett wrote:
On Sat, Mar 14, 2009 at 9:26 AM, O. O. olson_ot@yahoo.com wrote:
The above link says that “only articles” and no redirects are in the namespace NS0. Also Talk: pages are not included in the NS0. Then, when the current English Wikipedia advertises 2,791,033 Articles, I cannot understand why the list of Titles contains 5716820 Titles? This is a little more than double?
The larger number includes redirects, the smaller number doesn't.
Then why does this http://en.wikipedia.org/wiki/Wikipedia:NS0 say that “Redirects” are not considered as Articles and hence are not in NS0?
It doesn't say that, it says "Not all pages in the article namespace are considered to be articles", listing redirects as an example.
The terminology is indeed confusing. ns0 is the "main" namespace, which is used for "articles". But it also contains redirects. For the statistics, the software tries to count "real" or "good" articles, which is defined to be in ns0, not a redirect, and containing at least one link. It may in the future even be redefined not to include disambiguation pages. The title list however contains all pages in ns0.
Talk pages are in their own namesapace, or rather, namespaces. Namespaces come in pairs: the namespace itself (even id), and the corresponding talk namespace (odd id).
plotting number of articles could help a observer "see" the grown of a wiki, but is a bad number to see the "dead" of a wiki.
but.. he!.. maybe all wikis on the mediawiki proyect are just growing, so we don't have this phenomenon just now, maybe in a few years we will see some "wastelands wikis". Immense amounts of text that no one can maintain (are interested in maintain) and let on his own suffer a continuous degradation. Anyway all our wikis are on his infancy, and I am thinking 5+ years forward, and there are lots and lots of urgent problems just now.
please ignore this email
On Fri, Mar 13, 2009 at 6:26 PM, O. O. olson_ot@yahoo.com wrote:
Thanks Daniel. I had not understood the meaning of NS0. Anyway I found the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0 However this confuses me even more.
Pages on the English Wikipedia that start with any of the following prefixes are *not* in the main namespace (ns0):
Talk: User: User talk: Wikipedia: Wikipedia talk: File: File talk: MediaWiki: MediaWiki talk: Template: Template talk: Help: Help talk: Category: Category talk: Portal: Portal talk: Special:
All pages that do not start with one of these special prefixes are automatically in namespace 0. To check the namespace number of a page if you're uncertain, you can view the page source and check the body element's classes. namespace 0 pages will have the class "ns-0". Other pages will have some other number; for instance, "Talk:" pages will have "ns-1", because "Talk:" is namespace 1. "User:" is 2, "User talk:" is 3, etc.
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 6:26 PM, O. O. olson_ot@yahoo.com wrote:
Thanks Daniel. I had not understood the meaning of NS0. Anyway I found the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0 However this confuses me even more.
Pages on the English Wikipedia that start with any of the following prefixes are *not* in the main namespace (ns0):
Talk: User: User talk: Wikipedia: Wikipedia talk: File: File talk: MediaWiki: MediaWiki talk: Template: Template talk: Help: Help talk: Category: Category talk: Portal: Portal talk: Special:
All pages that do not start with one of these special prefixes are automatically in namespace 0. To check the namespace number of a page if you're uncertain, you can view the page source and check the body element's classes. namespace 0 pages will have the class "ns-0". Other pages will have some other number; for instance, "Talk:" pages will have "ns-1", because "Talk:" is namespace 1. "User:" is 2, "User talk:" is 3, etc.
Note that some namespaces such as WP: or Image: not explicit on the list above are aliases for them.
On Sun, Mar 15, 2009 at 11:22 AM, Platonides Platonides@gmail.com wrote:
Aryeh Gregor wrote:
On Fri, Mar 13, 2009 at 6:26 PM, O. O. olson_ot@yahoo.com wrote:
Thanks Daniel. I had not understood the meaning of NS0. Anyway I found the details of NS0 from http://en.wikipedia.org/wiki/Wikipedia:NS0 However this confuses me even more.
Pages on the English Wikipedia that start with any of the following prefixes are *not* in the main namespace (ns0):
Talk: User: User talk: Wikipedia: Wikipedia talk: File: File talk: MediaWiki: MediaWiki talk: Template: Template talk: Help: Help talk: Category: Category talk: Portal: Portal talk: Special:
All pages that do not start with one of these special prefixes are automatically in namespace 0. To check the namespace number of a page if you're uncertain, you can view the page source and check the body element's classes. namespace 0 pages will have the class "ns-0". Other pages will have some other number; for instance, "Talk:" pages will have "ns-1", because "Talk:" is namespace 1. "User:" is 2, "User talk:" is 3, etc.
Note that some namespaces such as WP: or Image: not explicit on the list above are aliases for them.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Media: also comes to mind.
-Chad
On Sun, Mar 15, 2009 at 11:22 AM, Platonides Platonides@gmail.com wrote:
Note that some namespaces such as WP: or Image: not explicit on the list above are aliases for them.
On Sun, Mar 15, 2009 at 12:00 PM, Chad innocentkiller@gmail.com wrote:
Media: also comes to mind.
Page titles cannot begin with these prefixes, so I deliberately omitted them. What I said is correct.
wikitech-l@lists.wikimedia.org