OK, to summarize what this thread seems to be saying,
the procedure is:
1. Split the title from the rest of the URL.
2. Percent-decode the title, yielding a UTF-8 byte string.
3. Convert the byte string into a Unicode string.
4. Replace underscores with spaces.
Step 4 yields the article title, which is what appears in the XML dumps.
Wrong. AFAIK XML dumps are already encoded in UTF-8 so you don't need
step 3. You'd only need to klnow it if you want to present somehow the
title to the user.