I think the current handling of interlanguage links is problematic and not very scalable. If we have n copies of an article, we need need n*n-1 interlanguage links. For 10 languages, that would be 90 links! All of these links have to be added to separate pages, by people speaking different languages, who often don't even have an account on the Wikipedia in question.
As should be obvious, we are already missing interlanguage links for many, if not most, of the translations we have.
The scalable solution requires us to have a meta-table for interlanguage links that can be accessed by all Wikipedias. This table could look like this:
language1 article1 language1 article2 ------------------------------------------------------------ en Main Page de Hauptseite fr Accueil en Main Page fr Accueil es Portada ...
Let's call it shared.ilinks for the moment.
Instead of adding interlanguage links on top of articles, we would have a separate text line below article bodies:
Interlanguage links (syntax: [[<code>:<article name>]])
The syntax would remain the same so that the link line can be cut and pasted from the body. But this line would not be stored in that form in the database.
Display of interlanguage links ------------------------------ Say I visit [[Main Page]] on en.wikipedia.org. Now, in order to show the list of links, the shared.ilinks table is queried:
SELECT * from shared.ilinks where (language1=en and article1="Main Page") or (language2=en and article2="Main Page")
That is, a single SELECT allows us to find all translations of the word "Main Page". But don't we only save relatively little time, as we still have to tell *every* Wikipedia that homepage means "Main Page" in English? No, because we can now leave this to the code.
When a user edits a page, the same list of links is generated, but this time in the wiki syntax ([[fr:Accueil]] [[de:Hauptseite]] and so on). This can be edited by anyone. When the list has been edited, and the page is saved, the following is done:
1) The same SELECT as above is run: SELECT * from shared.ilinks where (language1=en and article1="Main Page") or (language2=en and article2="Main Page")
2) Now, for each translation we get, another similar SELECT is run, so that we find further translations into other languages.
3) Every new translation we discover is stored in a new English (in our example)/<new translation> table row, so that we can do the quick, one-time SELECT to display the interlanguage links.
The result: If we have a page in 10 translations, the minimum effort we have to go to is to add exactly one translation on every language Wikipedia. That is, a minimum of 9 as opposed to 90 links! The other translations are automatically discovered.
Example: Someone creates a new page about Phil Collins on fr.wikipedia.org. This person knows that there's already an English page about him on en.wikipedia.org, so they type [[en:]] (suggested short syntax for "same name as here"). "fr:Phil Collins->en:Phil Collins" is inserted into the shared.ilinks table. This already means that the link is also shown on en.wikipedia.org. But it gets better: Now someone on de.wikipedia.org creates a Phil Collins page as well. He links to en.wikipedia.org's [[en:]] entry. Zap!, after saving the entry, the French translation is automatically discovered. Now the French translation has a link to the German page and vice versa as well.
Editing links ------------- What happens if the folks on fr.wikipedia.org move one of their pages? The "Move this page" command now needs to automatically change every instance of the page to something else (e.g. Accueil->Homepage) in the shared.ilinks table.
What happens if someone on en.wikipedia.org decides that they do not want to link to a page on nl.wikipedia.org because it contains obsolete information, or because of "link-vandalism"? To unilaterally remove a link to one translation, there would have to be a special interlanguage link, like [[nl::]]. When saved, the link would be cleared and not "rediscovered" until someone removed the [[nl::]] link. Such empty links would not be copied.
If [[nl:Hoofdpagina]] is deleted, all instances of it in the shared.ilinks table are removed as well.
What about links where there is no 1:1 relationship? Say I have a page about "evolution" and "theory of evolution" on one wiki (English) and only a page about "evolution" on another (French). So I add the following to en.wikipedia.org on both pages:
[[fr:Théorie de l'évolution]]
In the shared.ilinks table, I therefore get entries: Evolution Théorie de l'évolution Theory of Evolution Théorie de l'évolution
When I visit the "Evolution" page, I get a clear match: Théorie de l'évolution. But when I visit the "Théorie de l'évolution", I get two matches. In this case, we could actually show both links on the French page:
English: [1],[2]
Or in edit mode:
[[en:Evolution]][[en:Theory of Evolution]]
It may not be desirable to autocopy these duplicate links. So, if we cannot discover an exact match, we may want to wait until someone specifies a precise translation.
Discussion ---------- The process described above is complex from a technical perspective, because it has to be respected during all changes to articles (move, delete, edit etc.) It also requires us to run a separate database server specifically for this shared information. There may be scenarios that I have not yet covered in the above proposal, although I am sure solutions can be found for every problem.
There are numerous advantages to this approach. Compared with the current handling, we should quickly get an accurate representation of interlanguage links on all wikis. We do not have to pick a single language as "key" language, which would require a key entry in that language to exist for all pages. [1]
There may be simpler solutions that I cannot see - if so, I would love to hear about them. But I really think we should consider redesigning the interlanguage links before the problem grows out of control.
Regards,
Erik
[1] Although that would expose us to charges of anglocentrism, I am open to discussing this alternative.
I think the current handling of interlanguage links is problematic and not very scalable. If we have n copies of an article, we need need n*n-1 interlanguage links. For 10 languages, that would be 90 links! All of these links have to be added to separate pages, by people speaking different languages, who often don't even have an account on the Wikipedia in question.
It's not as bad as you make it appear here, those go to 10, not 90 different pages. Still, the person who added the tenth language would, if (s)he did it the 'proper' way, have to add a total of 19 links on a total of 10 different pages. Not a desirable situation.
As should be obvious, we are already missing interlanguage links for many, if not most, of the translations we have.
There's certainly already a wealth of missing back- and through-links available, as well as a number of interlanguage links to non-existing pages.
The scalable solution requires us to have a meta-table for interlanguage links that can be accessed by all Wikipedias. This table could look like this:
language1 article1 language1 article2
en Main Page de Hauptseite fr Accueil en Main Page fr Accueil es Portada ...
My preference would be to have a different type of metatable, namely one where each subject gets an indication (an English name, another name, or just a number), and the articles are then stored by these (the 'key language approach' as you call it). Your above case would then look:
1 en Main Page 1 de Hauptseite 1 fr Accueil 1 es Portada
or even: 1 [[en:Main Page]][[de:Hauptseite]][[fr:Accueil]][[es:Portada]]
The reason I prefer this is for what you mention below:
What about links where there is no 1:1 relationship? Say I have a page about "evolution" and "theory of evolution" on one wiki (English) and only a page about "evolution" on another (French). So I add the following to en.wikipedia.org on both pages:
[[fr:Théorie de l'évolution]]
In the shared.ilinks table, I therefore get entries: Evolution Théorie de l'évolution Theory of Evolution Théorie de l'évolution
When I visit the "Evolution" page, I get a clear match: Théorie de l'évolution. But when I visit the "Théorie de l'évolution", I get two matches. In this case, we could actually show both links on the French page:
English: [1],[2]
Or in edit mode:
[[en:Evolution]][[en:Theory of Evolution]]
It may not be desirable to autocopy these duplicate links. So, if we cannot discover an exact match, we may want to wait until someone specifies a precise translation.
In the alternative method, it will be possible to have one page connected to multiple 'interlanguage rings'. For example, some Wikipedias have a page 'astronomy and astrophysics', while others have a page 'astronomy' and a page 'astrophysics'. In my proposed way of working, the 'astronomy and astrophysics' pages could be linked to both rings, so that they are linked (in both directions) to 'astronomy' and 'astrophysics' pages without 'astronomy' pages being linked to 'astrophysics' pages.
There may be scenarios that I have not yet covered in the above proposal, although I am sure solutions can be found for every problem.
The main problem I have found, is the one you try to solve with the [[nl:: links - when a page is linked in a way it should not, removing all inappropriate links will nevertheless get the links back because the appropriately linked pages will also be linked.
There are numerous advantages to this approach. Compared with the current handling, we should quickly get an accurate representation of interlanguage links on all wikis. We do not have to pick a single language as "key" language, which would require a key entry in that language to exist for all pages. [1]
My preference would be to have a 'neutral' key language, for example simply a numbering of all link groups that exist. Disadvantage of that method is that inappropriate links can still come into existence (something one could hope to avoid by using a real key language, which however increases the risk of several 'rings' coming into existence around the same subject). They would however be easier to repair than in your proposal.
My proposal would look something like this:
A table of terms with their pages in various languages, as described above. There are no interlanguage links in the box below the page, instead there is a table number or list of table numbers. However, there is an option for users to specify that a page should be connected to a specific page in another language. If that is done, the following can be the case:
1. Both pages are not language-connected yet. In that case a new ring would be formed with the two pages in it. 2. Both pages are already in a ring. Then the user gets a new screen, where the two rings are given, with the pages in the two rings. The user has the following options: * Melt the rings together into one ring * Add the first page to the second ring * Add the second page to the first ring
In the latter two cases, this will cause the added page to become part of two rings.
3. One page is in more than one ring. Then the user again gets to see all the rings with the pages in them; he can select one ring, causing the other page to be added to that ring, or two rings, causing those two rings to be melted together.
When a page already has interlanguage links, a user has a third option to change its international links (apart from adding as specified above and changing the ring memberships by hand), namely specifying that a link should not exist. He then gets to see all pages (in the various languages) that are part of the ring, and creates two rings, ring A and ring B, from them. For each page in the ring, he specifies whether it should belong to ring A, ring B or both.
The advantage of my system (compared to yours) is that it makes the handling of multiple-subject pages and the correction of unwanted links easier. The disadvantages are that it is probably even harder to implement and that it is less wiki-like, using form-like entities rather than markup language.
Andre Engels
I proposed enhanced interlanguage link manipulation/storage several times before (check the archive;-), and gave that some thought.
I propose a new, independent *database*, instead of just a table. We'll have to access another database from all but one wikis, anyway (we can't store a complete consistent copy of that table in all databases, now can we?).
That database could also hold other information, mainly a central user database, so multi-language users won't have to create new user accounts in every language.
Later, it could also be a place for the translations that are currently in the LanguageXX.php files.
Magnus
Tomasz Wegrzanowski wrote:
On Tue, Jan 07, 2003 at 07:44:55PM +0100, Magnus Manske wrote:
Later, it could also be a place for the translations that are currently in the LanguageXX.php files.
Please, don't put things that are static into database. It is slow enough now.
I thought of an online interface for translators to add/change items, then an "update" button that creates a new LanguageXX.php file for the edited language.
Magnus
On Die, 2003-01-07 at 19:44, Magnus Manske wrote:
I proposed enhanced interlanguage link manipulation/storage several times before (check the archive;-), and gave that some thought.
I propose a new, independent *database*, instead of just a table. We'll have to access another database from all but one wikis, anyway (we can't store a complete consistent copy of that table in all databases, now can we?).
That database could also hold other information, mainly a central user database, so multi-language users won't have to create new user accounts in every language.
Magnus,
of course the table I suggested would have to reside in some shared database. This DB could then later also hold user data and other information we want shared.
I would, however, want this approach to be limited to allow for Wikipedias that reside on foreign servers. If the central shared database fails, these Wikipedias should continue to function.
I would also like to suggest that we concentrate on fixing one problem at a time. If we try to do too many different things at once, we may not get anything done right in the end. The interlanguage links are an important problem to start working on.
Regards,
Erik
wikitech-l@lists.wikimedia.org