That's a pretty good solution, although one of the issues is that the title includes the namespace, which needs to be removed to get the actual page title. I feel that the <page> section should be complete in and of itself, without requiring the header section mapping namespace names to ids. Without knowing the mappings (ns to ns-title) that are present in the header, you cannot interpret the title unambiguosly, for example <title ns="0">Star Trek: The Next Generation</title> relies on the parser knowing that ns-0 is not called 'Star Trek' in order to be interpreted properly.
How about <title ns="12" ns-title="Help">Contents</title>?
- Mark Clements (HappyDog)
I think you could assume that any non-zero namespace has prefix so you'd only need to split on the first ':' if it has a namespace number != 0 (this assumes we will never setup a namespace with ':' in it).
BTW: why are you having so much trouble with this?