Clutch wrote:
For instance, let us take a link Talk:foo I can explode this into Talk and foo. Each namespace has an "urlprefix". For Talk, the urlprefix would be http://www.wikipedia.org/Talk
Now, let's add a (separate) language into the mix. en:Talk:foo I can explode it into en, Talk, and foo. Now, the urlprefix for Talk is the same. How do I say that it is an english language page? The normal, standard way to do this would be like so: http://www.wikipedia.org/en/Talk/foo That feels "right" to me. But doing that would require parsing the urlprefix for the namespace to figure out where to put in the language. I don't want to do that, and don't feel I should have to.
You have to parse the namespace anyway, to see if it's really a namespace. Remember, colons are perfectly acceptable in article titles, and they *don't* indicate namespaces *or* languages -- except in a few special cases. (Consider [[en:E. coli O157:H7]].)
So in order to parse a link correctly, we need to do these steps: * Decide which language it is: * Is there a colon? Y * Take the string up to the first colon; * Is this string a language code? Y * Drop this bit from the text of the link; * That string indicates the language. N * It's the current language. N * It's the current language. * Decide which namespace it is: * Is there a colon now? Y * Take the string up to the first colon; * Is this string a namespace *in*the*relevant*language*? Y * Drop this bit from the text of the link; * That string indicates the namespace. N * It's the main namespace. N * It's the main namespace. * Decide what the page title is: * Take the rest of the string; * That's the title.
The algorithm is a bit more complicated than this, because of: * Special meanings when a colon begins the link. * The pipe trick (and it's use of both namespaces and parentheses). * Error handling (when the link has forbidden characters). But the above 3 steps must all be done, and in that order, or we'll break functionality of certain links like the E coli one above.
-- Toby