I noticed that the other-language links (links in the form [[fr:Japon]] [[en:Japan]] [[eo:Japanio]] etc which are hidden in the article body but listed by language name in the header bar, pointing to the article on the current subject in the other-language wikis) are vanishing on cached pages, because they're scanned and listed during the wiki->html link parsing which of course doesn't occur when loading a cached page.
I've a hackish fix for that which explicitly seeks out the other- language links for cached pages, but I don't like it very much. It's inelegant, and two sets of code have to be maintained to do the same thing in different contexts.
What I'd like to do is add a column to the cur table, something like cur_links_languages which would be analogous to cur_links_linked and cur_links_unliked. The list of inter-language links for a page would be stored when the page is saved, then easily loaded up again along with the cache. This would also make it easy to provide statistics on the degree of linkage between language wikis. (No change in current user-visible behavior except in fixing the obvious bug of vanishing links, and potentially providing more information in special:Statistics etc.)
Alternatively, we might have a separate database which contains nothing but lists of connected articles. This could facilitate keeping the other-language links consistent; if somebody adds an article "JapĆ³n" to the Spanish wikipedia, it shouldn't be necessary to separately add [[es:Jap%f3n]] to the English, French, Esperanto, etc. articles. Keeping a central repository would mean that it only needs to be linked in with the others once, and all linked articles will immediately benefit by being able to list it without manual editing. Upside: added simplicity for article writers, who don't have to maintain as many links. Downside: added complexity for site maintainers, who have to run a second database or not get all the other-language links. Also might be more difficult to remove incorrectly linked articles.
An alternative to the separate link database might be a robot/automatic process that occasionally looks through all the wikipedias checking for consistency in the other-language links and automatically adding (or alerting a human that one ought to add) new other-language links where needed.
So what do people think? Should we try one of these, or should I just check in my hackish fix for the meantime?
-- brion vibber (brion @ pobox.com)
From: "Brion Vibber" brion@pobox.com
I noticed that the other-language links (links in the form [[fr:Japon]] [[en:Japan]] [[eo:Japanio]] etc which are hidden in the article body but listed by language name in the header bar, pointing to the article on the current subject in the other-language wikis) are vanishing on cached pages, because they're scanned and listed during the wiki->html link parsing which of course doesn't occur when loading a cached page.
Can I suggest we simply stop with the whole caching thing? It complicates things unnecesarily. Keeping the code simple should be one of our top priorities. Jimbo doesn't have it turned on at the moment anyway, and Wikipedia seems to be fine on non-generated pages. And I expect that we can do really a lot of optimization on the generated pages comparable with Recent Changes (which is also not cached at the moment), and there is a whole bunch of very inefficient (esp. in terms of memory use) programming going on in the current parser.
Alternatively, we might have a separate database which contains nothing but lists of connected articles. This could facilitate keeping the other-language links consistent; [...]
*sigh* It's a very nice idea, but currently I don't believe that phpwiki is really out of the woods yet. First the current functionality has to be correct, efficient and the code has to be well-organized and documented. And only then can we start thinking about such fancy extensions of functionality.
-- Jan Hidders
wikitech-l@lists.wikimedia.org