I think the current handling of interlanguage links is problematic and not very scalable. If we have n copies of an article, we need need n*n-1 interlanguage links. For 10 languages, that would be 90 links! All of these links have to be added to separate pages, by people speaking different languages, who often don't even have an account on the Wikipedia in question.
As should be obvious, we are already missing interlanguage links for many, if not most, of the translations we have.
The scalable solution requires us to have a meta-table for interlanguage links that can be accessed by all Wikipedias. This table could look like this:
language1 article1 language1 article2 ------------------------------------------------------------ en Main Page de Hauptseite fr Accueil en Main Page fr Accueil es Portada ...
Let's call it shared.ilinks for the moment.
Instead of adding interlanguage links on top of articles, we would have a separate text line below article bodies:
Interlanguage links (syntax: [[<code>:<article name>]])
The syntax would remain the same so that the link line can be cut and pasted from the body. But this line would not be stored in that form in the database.
Display of interlanguage links ------------------------------ Say I visit [[Main Page]] on en.wikipedia.org. Now, in order to show the list of links, the shared.ilinks table is queried:
SELECT * from shared.ilinks where (language1=en and article1="Main Page") or (language2=en and article2="Main Page")
That is, a single SELECT allows us to find all translations of the word "Main Page". But don't we only save relatively little time, as we still have to tell *every* Wikipedia that homepage means "Main Page" in English? No, because we can now leave this to the code.
When a user edits a page, the same list of links is generated, but this time in the wiki syntax ([[fr:Accueil]] [[de:Hauptseite]] and so on). This can be edited by anyone. When the list has been edited, and the page is saved, the following is done:
1) The same SELECT as above is run: SELECT * from shared.ilinks where (language1=en and article1="Main Page") or (language2=en and article2="Main Page")
2) Now, for each translation we get, another similar SELECT is run, so that we find further translations into other languages.
3) Every new translation we discover is stored in a new English (in our example)/<new translation> table row, so that we can do the quick, one-time SELECT to display the interlanguage links.
The result: If we have a page in 10 translations, the minimum effort we have to go to is to add exactly one translation on every language Wikipedia. That is, a minimum of 9 as opposed to 90 links! The other translations are automatically discovered.
Example: Someone creates a new page about Phil Collins on fr.wikipedia.org. This person knows that there's already an English page about him on en.wikipedia.org, so they type [[en:]] (suggested short syntax for "same name as here"). "fr:Phil Collins->en:Phil Collins" is inserted into the shared.ilinks table. This already means that the link is also shown on en.wikipedia.org. But it gets better: Now someone on de.wikipedia.org creates a Phil Collins page as well. He links to en.wikipedia.org's [[en:]] entry. Zap!, after saving the entry, the French translation is automatically discovered. Now the French translation has a link to the German page and vice versa as well.
Editing links ------------- What happens if the folks on fr.wikipedia.org move one of their pages? The "Move this page" command now needs to automatically change every instance of the page to something else (e.g. Accueil->Homepage) in the shared.ilinks table.
What happens if someone on en.wikipedia.org decides that they do not want to link to a page on nl.wikipedia.org because it contains obsolete information, or because of "link-vandalism"? To unilaterally remove a link to one translation, there would have to be a special interlanguage link, like [[nl::]]. When saved, the link would be cleared and not "rediscovered" until someone removed the [[nl::]] link. Such empty links would not be copied.
If [[nl:Hoofdpagina]] is deleted, all instances of it in the shared.ilinks table are removed as well.
What about links where there is no 1:1 relationship? Say I have a page about "evolution" and "theory of evolution" on one wiki (English) and only a page about "evolution" on another (French). So I add the following to en.wikipedia.org on both pages:
[[fr:Théorie de l'évolution]]
In the shared.ilinks table, I therefore get entries: Evolution Théorie de l'évolution Theory of Evolution Théorie de l'évolution
When I visit the "Evolution" page, I get a clear match: Théorie de l'évolution. But when I visit the "Théorie de l'évolution", I get two matches. In this case, we could actually show both links on the French page:
English: [1],[2]
Or in edit mode:
[[en:Evolution]][[en:Theory of Evolution]]
It may not be desirable to autocopy these duplicate links. So, if we cannot discover an exact match, we may want to wait until someone specifies a precise translation.
Discussion ---------- The process described above is complex from a technical perspective, because it has to be respected during all changes to articles (move, delete, edit etc.) It also requires us to run a separate database server specifically for this shared information. There may be scenarios that I have not yet covered in the above proposal, although I am sure solutions can be found for every problem.
There are numerous advantages to this approach. Compared with the current handling, we should quickly get an accurate representation of interlanguage links on all wikis. We do not have to pick a single language as "key" language, which would require a key entry in that language to exist for all pages. [1]
There may be simpler solutions that I cannot see - if so, I would love to hear about them. But I really think we should consider redesigning the interlanguage links before the problem grows out of control.
Regards,
Erik
[1] Although that would expose us to charges of anglocentrism, I am open to discussing this alternative.