The namespace prefix appears at the beginning of the page title, which appears as the text contents of the /mediawiki/page/title element, separated by a colon from the remaining title part.
That's what I gathered from reading the Wiki docs before I started trying to parse the XML. But look at this snippet of XML from enwiki-latest-pages-articles.xml.bz2 taken from late August. I don't see any namespace in the <title> elements, any ideas why, or am I looking in the wrong spot?
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> <siteinfo> <sitename>Wikipedia</sitename> <base>http://en.wikipedia.org/wiki/Main_Page</base> <generator>MediaWiki 1.8alpha</generator> <case>first-letter</case> <namespaces> <namespace key="-2">Media</namespace> <namespace key="-1">Special</namespace> <namespace key="0" /> <namespace key="1">Talk</namespace> <namespace key="2">User</namespace> <namespace key="3">User talk</namespace> <namespace key="4">Wikipedia</namespace> <namespace key="5">Wikipedia talk</namespace> <namespace key="6">Image</namespace> <namespace key="7">Image talk</namespace> <namespace key="8">MediaWiki</namespace> <namespace key="9">MediaWiki talk</namespace> <namespace key="10">Template</namespace> <namespace key="11">Template talk</namespace> <namespace key="12">Help</namespace> <namespace key="13">Help talk</namespace> <namespace key="14">Category</namespace> <namespace key="15">Category talk</namespace> <namespace key="100">Portal</namespace> <namespace key="101">Portal talk</namespace> </namespaces> </siteinfo> <page> <title>AaA</title> <id>1</id> <revision> <id>46448774</id> <timestamp>2006-04-01T12:07:25Z</timestamp> <contributor> <username>Gurch</username> <id>241822</id> </contributor> <minor /> <comment>{{R from CamelCase}}</comment> <text xml:space="preserve">#REDIRECT [[AAA]] {{R from CamelCase}} {{R from other capitalisation}}</text> </revision> </page> <page> <title>AlgeriA</title> <id>5</id> <revision> <id>18063769</id> <timestamp>2005-07-03T11:13:13Z</timestamp> <contributor> <username>Docu</username> <id>8029</id> </contributor> <minor /> <comment>adding cur_id=5: {{R from CamelCase}}</comment> <text xml:space="preserve">#REDIRECT [[Algeria]]{{R from CamelCase}}</text> </revision> </page>
[...]
Mike O
Mike O wrote:
The namespace prefix appears at the beginning of the page title, which appears as the text contents of the /mediawiki/page/title element, separated by a colon from the remaining title part.
That's what I gathered from reading the Wiki docs before I started trying to
parse the XML. But look at this snippet of XML from enwiki-latest-pages-articles.xml.bz2 taken from late August. I don't see any namespace in the <title> elements, any ideas why, or am I looking in the wrong spot?
Those particular page titles are in the main (article) namespace, which has no prefix:
<namespace key="0" />
[snip]
<title>AaA</title>
[snip]
<title>AlgeriA</title>
-- brion vibber (brion @ pobox.com)
Mike wrote:
...look at this snippet of XML from enwiki-latest-pages-articles.xml.bz2 taken from late August. I don't see any namespace in the <title> elements, ... <title>AaA</title> ... <title>AlgeriA</title>
Those are main namespace (namespace 0) articles, so they don't have a prefix. But all the non-main-namespace pages do. The first is <id>724</id><title>Wikipedia:Adding Wikipedia articles to Nupedia</title>.)
wikitech-l@lists.wikimedia.org