Hoi,
As a result of our exploring opportunities for Ultimate Wiktionary, I was pointed to LISA the Localisation Industry Standards Association. I downloaded some information and was asked afterwards for some information. This in turn led to the question if I was willing to write an article about Wikimedia and localisation. So I did. It can be found here: http://www.lisa.org/globalizationinsider/
Thanks, GerardM
Gerard Meijssen wrote:
Hoi,
As a result of our exploring opportunities for Ultimate Wiktionary, I was pointed to LISA the Localisation Industry Standards Association. I downloaded some information and was asked afterwards for some information. This in turn led to the question if I was willing to write an article about Wikimedia and localisation. So I did. It can be found here: http://www.lisa.org/globalizationinsider/
The following is what he says there
Using this imperfect system of templates has taught us that 80% of the lexicological content can be expressed using templates.
What evidence is there of this? Sure, we can get closer to that when translations are viewed as mechanical acts and we can ignore all subtleties of language. In reality such an attitude only goves a lot of pap.
The next step will be for us to combine all the language-independent content in a database.
That's a very small part of the content.
Our challenge will be to translate the user interface in as many languages as possible.
That much is already being done without Gerard's UW
This is the first hurdle to make the Ultimate Wiktionary accessible in any language. The next step is to encourage people with language knowledge to contribute to the Wiktionary by providing descriptions and etymological information for various terms.
This too is already being done in the English Wiktionary. That's what lexicography is all about.
The Ultimate Wiktionary will become extremely relevant, based on the special content that it will contain. For example, we plan to include the GEMET thesaurus <http://www.eionet.eu.int/GEMET>, the ecological resource of the European Community (EC).
Although I have no doubt that this is useful information there should be no preference given to EC terminology. It needs to be made clear that different terminology used in countries which share a language with some member of the EC will be on an equal footing.
It will also be possible for users to add content in other languages, making the original thesaurus even more accessible and more valuable to more people. We hope to be able to cooperate with organizations such as the EC in order to host other glossaries and thesauri. As everyone is invited to contribute content, we envision this content being translated into many more languages and thus resulting in increased trade opportunities for the EC.
This looks like some kind of hidden agenda. Many of us have resisted any appearance of being dominated by the thinking and ideas of the United States. Any attempt to impose EC dominance should be resisted just as strongly.
The current Wiktionaries will be converted to the Ultimate Wiktionary.
You say this with far too much conviction. At other times you appear to make the prediction that participants in the various projects will see your UW as so great that they will melt into your arms. As a sceptic I can accept that statement. It allows me to wait until there is something real to comment about; it allows others with more technical experience to amend your software to suit the needs of Wikimedia. I cannot and will not accept your flat out statement that it "will be converted" any more than I can suspend rationality long enough to accept Christ as my personal saviour.
This means that people will have access to the Dutch Wiktionary with many words in Papiamento, the Italian Wiktionary with many Neapolitan words, and the Kurdish Wiktionary with words in many different Kurdish dialects. The goal with the Ultimate Wiktionary is to overcome the fractured nature of individual Wiktionaries. By combining them into one central repository, people will be able to access a much greater variety of content, thus enabling the Ultimate Wiktionary to be greater than the sum of its separate Wiktionary parts.
This is in sharp contrast to what you said in other parts of the article
Contrary to what the typical LISA Member has available, we do not have an organizational structure that decides what to do next. We do not have policies that determine what content is to be available in all Wikipedias. We do not translate content as a rule.
or again
As the projects grow, we find that they have different values and a different view of “the Truth.” These are the issues where culture comes into play.
What you praise here about the Wikipedias you would condemn in Wiktionary. The difference in values and cultures is just as strong in Wiktionary as it is in Wikipedia. It needs to be respected just as much.
Ec
PS: It is not my usual habit to crosspost my comments, an I seriously considered posting this on only one of the lists, but since Gerard has chosen to put his POV on three lists it seemed appropriate that at least the initial rebuttal should be similarly distributed.
PS: It is not my usual habit to crosspost my comments, an I seriously considered posting this on only one of the lists, but since Gerard has chosen to put his POV on three lists it seemed appropriate that at least the initial rebuttal should be similarly distributed.
I don't have time to answer that now due to too much work - most of what you said is your POV - is fear about the new - sorry, but it seems like that.
Gerard is not the only one working on UW.
There's no "we want only EU terminology" - it is a start and all are invited to contribute even the UNESCO and the US-Government if they have valuable glossaries.
More in a second mail when I finished my job.
Ciao, Sabine
___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it
Ray Saintonge wrote:
Gerard Meijssen wrote:
Hoi,
As a result of our exploring opportunities for Ultimate Wiktionary, I was pointed to LISA the Localisation Industry Standards Association. I downloaded some information and was asked afterwards for some information. This in turn led to the question if I was willing to write an article about Wikimedia and localisation. So I did. It can be found here: http://www.lisa.org/globalizationinsider/
The following is what he says there
Using this imperfect system of templates has taught us that 80% of the lexicological content can be expressed using templates.
What evidence is there of this? Sure, we can get closer to that when translations are viewed as mechanical acts and we can ignore all subtleties of language. In reality such an attitude only goves a lot of pap.
The evidence can be found in the practice of the it: or nl: and other wiktionaries. Please explain "goves a lot of pap".
The next step will be for us to combine all the language-independent content in a database.
That's a very small part of the content.
Given that the language-independent content is over 80%, I wonder what a big part is..
Our challenge will be to translate the user interface in as many languages as possible.
That much is already being done without Gerard's UW
There will be an User Interface part to the Ultimate Wiktionary, that will have to be localised.
This is the first hurdle to make the Ultimate Wiktionary accessible in any language. The next step is to encourage people with language knowledge to contribute to the Wiktionary by providing descriptions and etymological information for various terms.
This too is already being done in the English Wiktionary. That's what lexicography is all about.
At issue is the scope that you have. If you think the English Wiktionary everything, you forget about all the other Wiktionaries. If work done on the English Wiktionary is only to benefit the English Wiktionary your scope is somewhat limited.
The Ultimate Wiktionary will become extremely relevant, based on the special content that it will contain. For example, we plan to include the GEMET thesaurus <http://www.eionet.eu.int/GEMET>, the ecological resource of the European Community (EC).
Although I have no doubt that this is useful information there should be no preference given to EC terminology. It needs to be made clear that different terminology used in countries which share a language with some member of the EC will be on an equal footing.
We have already introduced a glossary with medical information in the Dutch Wiktionary, the GEMET database contains data for many languages including American English. The definitions are by reputable institutions including US universities. So please know what you talk about. The GEMET data is given to us to be published under the GNU-FDL. So what is your problem? If we can obtain other worthwhile resources and combine them in UW we will because it enriches the UW as a resource.
It will also be possible for users to add content in other languages, making the original thesaurus even more accessible and more valuable to more people. We hope to be able to cooperate with organizations such as the EC in order to host other glossaries and thesauri. As everyone is invited to contribute content, we envision this content being translated into many more languages and thus resulting in increased trade opportunities for the EC.
This looks like some kind of hidden agenda. Many of us have resisted any appearance of being dominated by the thinking and ideas of the United States. Any attempt to impose EC dominance should be resisted just as strongly.
I will be asking the Dutch government if we can host a glossary of words and their meaning in the various Dutch governmental organisations. When we want to inform what a word means and you equate informing what a word means for an organisation with letting this organisation dominate you, you disallow making this information public. From my point of view you are barking up the wrong tree, with a dictionary a glossary you inform and if the EC is more Open than the US, good for them..
The aim of Wiktionary is all words of all languages, would it not be great if we can have organisations look at their vocabulary .. even if it means that it makes plain how far off they are from what is commonly meant by this vocabulary they are ??
The current Wiktionaries will be converted to the Ultimate Wiktionary.
You say this with far too much conviction. At other times you appear to make the prediction that participants in the various projects will see your UW as so great that they will melt into your arms. As a sceptic I can accept that statement. It allows me to wait until there is something real to comment about; it allows others with more technical experience to amend your software to suit the needs of Wikimedia. I cannot and will not accept your flat out statement that it "will be converted" any more than I can suspend rationality long enough to accept Christ as my personal saviour.
As a sceptic, it does not allow you to bide your time and wait for the results. You do what you must. In the mean time I will work to make the UW a success.
This means that people will have access to the Dutch Wiktionary with many words in Papiamento, the Italian Wiktionary with many Neapolitan words, and the Kurdish Wiktionary with words in many different Kurdish dialects. The goal with the Ultimate Wiktionary is to overcome the fractured nature of individual Wiktionaries. By combining them into one central repository, people will be able to access a much greater variety of content, thus enabling the Ultimate Wiktionary to be greater than the sum of its separate Wiktionary parts.
This is in sharp contrast to what you said in other parts of the article
I do not understand what you mean; but let me try to explain. We want an user interface for every lanuguage, selectable in the user preferences. When content has been imported into the UW, the content included in the Dutch Wiktionary will make Papiamento, English, Italian etc content available. With more resources imported into the UW, the infomation will be enriched. I do not rubbish the accomplishments of the other wiktionaries, I want to make them available to all people of all languages. When an English word is known because of theire being an entry created in association with a word in Italian, it will be available for everyone never mind what user interface is used.
Contrary to what the typical LISA Member has available, we do not have an organizational structure that decides what to do next. We do not have policies that determine what content is to be available in all Wikipedias. We do not translate content as a rule.
or again
As the projects grow, we find that they have different values and a different view of “the Truth.” These are the issues where culture comes into play.
What you praise here about the Wikipedias you would condemn in Wiktionary. The difference in values and cultures is just as strong in Wiktionary as it is in Wikipedia. It needs to be respected just as much.
I do not see the point that you are making. A dictionary is deterministic; it informs of the meaning(s) of a word and other information that can be found about a word. We will have them all. An encyclopedia tries to explain what things are and how they relate. There is in my mind a huge difference in the amount of culture you will find in an encyclopedia and in a dictionary.
Ec
PS: It is not my usual habit to crosspost my comments, an I seriously considered posting this on only one of the lists, but since Gerard has chosen to put his POV on three lists it seemed appropriate that at least the initial rebuttal should be similarly distributed.
I have published in a periodical. The posting on the three lists was to inform you about this. These postings are as much as anything to prevent the feeling some people have that things are done in secret. They are not. Many people knew I was going to publish on the LISA periodical and hey, today you can read the result :) ..
The creation of this article was a lengthy affair. I have discussed issues with many wikimedians. If Ray is of the opinion that it is my POV, he is correct I published it and I am proud of it. I hope and expect that in time Ultimate Wiktionary will prove a success. If people are interested in how I envision certain things, they can ask me and we can discuss things. I am eager to know where we need to improve on the ideas that we have. But please use arguments in stead of sceptical comments based on .. what ??
Thanks, GerardM
Gerard Meijssen gerard.meijssen@gmail.com wrote:
The following is what he says there
Using this imperfect system of templates has taught us that 80% of the lexicological content can be expressed using templates.
What evidence is there of this? Sure, we can get closer to that when translations are viewed as mechanical acts and we can ignore all subtleties of language. In reality such an attitude only goves a lot of pap.
The evidence can be found in the practice of the it: or nl: and other wiktionaries. Please explain "goves a lot of pap".
it:, nl:, and the other wiktionaries are 99% stubs. Certainly it's conceivable that the language-independent content *now* is 80%. But in the future, when there is actually enough content to make wiktionary worth using, both in real definitions and auxiliary information such as etymological analysis (not just "{{xx}} X, from {{yy}} Y, ..."), discussion of pronunciation and grammatical usage in everyday vs. formal or standardized use, citations and translations of same into the user's native language, a real and useful thesaurus of each language (I mean more than a simple list of synonyms), etc., then the numbers will surely be quite different.
This is why I don't see anything particularly "ultimate" about UW. If all you really want to build is a "translationary," I don't doubt UW will be reasonable and sufficient. Outside of that, from what I understand, it looks like the vast amounts of real content will be provincialized, stuck in the language of the user who added it. It will be just like Wikipedia in that way, where it is little or no effort to carry over an infobox template, say, from en.wikipedia to la.wikipedia, but the real meat is not the standardized template but the actual article about what is being discussed... and UW only seems to concern itself with the equivalent of the infobox.
*Muke!
Muke Tever wrote:
Gerard Meijssen gerard.meijssen@gmail.com wrote:
The following is what he says there
Using this imperfect system of templates has taught us that 80% of the lexicological content can be expressed using templates.
What evidence is there of this? Sure, we can get closer to that when translations are viewed as mechanical acts and we can ignore all subtleties of language. In reality such an attitude only goves a lot of pap.
The evidence can be found in the practice of the it: or nl: and other wiktionaries. Please explain "goves a lot of pap".
it:, nl:, and the other wiktionaries are 99% stubs. Certainly it's conceivable that the language-independent content *now* is 80%. But in the future, when there is actually enough content to make wiktionary worth using, both in real definitions and auxiliary information such as etymological analysis (not just "{{xx}} X, from {{yy}} Y, ..."), discussion of pronunciation and grammatical usage in everyday vs. formal or standardized use, citations and translations of same into the user's native language, a real and useful thesaurus of each language (I mean more than a simple list of synonyms), etc., then the numbers will surely be quite different.
This is why I don't see anything particularly "ultimate" about UW. If all you really want to build is a "translationary," I don't doubt UW will be reasonable and sufficient. Outside of that, from what I understand, it looks like the vast amounts of real content will be provincialized, stuck in the language of the user who added it. It will be just like Wikipedia in that way, where it is little or no effort to carry over an infobox template, say, from en.wikipedia to la.wikipedia, but the real meat is not the standardized template but the actual article about what is being discussed... and UW only seems to concern itself with the equivalent of the infobox.
An entry will have the 'real content' available in many languages at the same time. It will be presented to you in the language of the interface that you used if available in that language. There will also be icons to show in what other languages this etymology or comment is available as well. And a button to translate it to a new language that wasn't available yet. Nothing gets provincialized. All the translations of the content will be found in the same place, instead of having to hunt across all the different language Wiktionaries. It's not just about translations. Although, they too, will benefit of only having to be put in once and become available for each and every person using the Wiktionary.
Ultimate Wiktionary is just a name. Rather fitting, if I may say so, but just a name all the same. It had to be called something.
I'm going to try and make an example of a common word and what it might look like in UW.
Bank seems like a good candidate to do this with. It is defined in en, de, nl, sv and volapük in the English Wiktionary. The Swedish Wiktionary doesn't have an entry for it yet. The Dutch and German Wiktionaries and they both have an extra definition in Indonesian, which is lacking in the English Wiktionary. Oddly the German Wiktionary is not defining the German word, so the entry probably wasn't added by a German speaker. With the UW there would already have been a beginning of an entry since it had been added by somebody working through the English language interface. The same goes for the Swedish definition. What's most important though is that Indonesian wouldn't have been lost to the English Wiktionary community. It may not have come with a definition written in English, but a definition in another language would have been available, ready to be translated. It would have been in view and not hidden from sight in the Dutch and German Wiktionaries waiting to be discovered by somebody who happens to enjoy jumping from one language Wiktionary to another.
Indicating what more exists will be taken care of by the software. It won't have to be added manually.
Have to run now.
Polyglot
cookfire cookfire@softhome.net wrote:
An entry will have the 'real content' available in many languages at the same time. It will be presented to you in the language of the interface that you used if available in that language. There will also be icons to show in what other languages this etymology or comment is available as well.
That sounds like it will be a *very* crowded interface before too long.
And a button to translate it to a new language that wasn't available yet. Nothing gets provincialized.
The provincialization remains in the fact that most people don't understand several languages. So, apparently a person can see that a definition is given in French and Hebrew... but.. * Do they know whether the Hebrew definition is a translation of the French definition, vice versa, or written independently? * Do they have any way of knowing whether the French definition has any more or less information than the Hebrew one? * Do they know whether they are perhaps for different senses of the same word? (Did the person who added the definition in Hebrew know enough French to mark it as a translation of a French one?)
All the translations of the content will be found in the sameplace, instead of having to hunt across all the different languageWiktionaries.
Interwiki links do that already, and are flexible enough to handle information that isn't just a translation of prior information.
It's not just about translations. Although, they too, will benefitof only having to be put in once and become available for each andevery person using the Wiktionary.
This, frankly, seems impossible, except for words with very specific semantics. I would like to see how this is supposed to be done.
I'm going to try and make an example of a common word and what it might look like in UW.
Bank seems like a good candidate to do this with.
Or maybe [[You]]. :p
*Muke!
Muke Tever wrote:
cookfire cookfire@softhome.net wrote:
An entry will have the 'real content' available in many languages at the same time. It will be presented to you in the language of the interface that you used if available in that language. There will also be icons to show in what other languages this etymology or comment is available as well.
That sounds like it will be a *very* crowded interface before too long.
That may be true and the only solution there is to let the user decide that he only wants to see those icons for the languages he knows.
And a button to translate it to a new language that wasn't available yet. Nothing gets provincialized.
The provincialization remains in the fact that most people don't understand several languages. So, apparently a person can see that a definition is given in French and Hebrew... but..
- Do they know whether the Hebrew definition is a translation of the French definition, vice versa, or written independently?
They don't have to. They read and compare the ones they do understand. If they find inconsistencies they can fix them. It would even be possible to mark the other definitions as 'dirty' or in need of being checked as well. Other people can search for places where this occurs for languages that they are proficient at. If you want the community of non-experts to build a dictionary you will have to give people some credit. If it's not right from the first time, you will have to accept that it will take a bit longer. But it will get right in the end. It will probably take 50 or 100 years before it reaches an acceptable level for the most common words, and even longer for uncommon words and less spoken languages. UW will help us become more efficient though. A lot less time will be wasted and this is important, it will take enough time as it is.
- Do they have any way of knowing whether the French definition has any more or less information than the Hebrew one?
Only if they click through, I guess, but that's true now as well, except now they don't even know if the French definition even exists.
- Do they know whether they are perhaps for different senses of the same word? (Did the person who added the definition in Hebrew know enough French to mark it as a translation of a French one?)
The situation is worse with balkanized Wiktionaries. In the endeavour we're in, I am not sure perfection is possible, but things certainly can be improved. If a person adds a translation that doesn't fit in one Wiktionary, it doesn't get seen by many people and it will take a very long time before anybody notices the error. If somebody comes by and copies the translations wholesale to other Wiktionaries the error moves along. Now if somebody corrects this error in the first or any of the derivatives, the correction doesn't flow through to the original or the other derivatives. In the proposed scenario of UW an error can be introduced just as well (it's the wiki way), but chances that somebody sees it and corrects it are a lot better. Besides that, once it is corrected, it is corrected for all the different interfaces to the UW at once. So everybody benefits. Translations will be done just like they are done now. One meaning of a word gets a list of possible translations in the target language. It will be possible to add a comment and of course it will be possible to click through to the proposed translations to check whether they fit.
Anyway, I think that a large proportion (say 20%) of people that have computers and internet connections are at least bilingual. Of course they can't be absolutely certain about anything they add, but that's why we have the wiki and its community to provide feedback and many eyeballs to eradicate incorrect stuff. It will be a lot easier if we are all looking at one and the same version of the content.
All the translations of the content will be found in the sameplace, instead of having to hunt across all the different languageWiktionaries.
Interwiki links do that already, and are flexible enough to handle information that isn't just a translation of prior information.
Take a look at bank in the English, the Dutch and the German dictionaries. The links are flexible, but the pages don't get updated.
It's not just about translations. Although, they too, will benefit of only having to be put in once and become available for each and every person using the Wiktionary.
This, frankly, seems impossible, except for words with very specific semantics. I would like to see how this is supposed to be done.
I'll try to give an example, but it'll take some time. Right now I need a way to include the contents of one page into another in the wiki interface. Anyway, please humour me and tell me what's wrong with the following:
- One word/term/expression has a series of meanings - Each of these meanings has a list of synonyms and translations in different languages. - The translations can be given as a list with occasionally a comment about usage, caveats, etc.
I'm going to try and make an example of a common word and what it might look like in UW.
Bank seems like a good candidate to do this with.
Or maybe [[You]]. :p
I'll have a look at it, but I wanted a spelling of a word that exists in more than one language.
Polyglot
cookfire cookfire@softhome.net wrote:
Muke Tever wrote:
- Do they know whether the Hebrew definition is a translation of the French definition, vice versa, or written independently?
They don't have to. They read and compare the ones they do understand. If they find inconsistencies they can fix them.
Here I am mostly concerned for the users here, not the editors.
It would even be possible to mark the other definitions as 'dirty' orin need of being checked as well.
This is easily abused though. (Had to clean up [[you]] today because of this. It'd been like that for a month.)
- Do they have any way of knowing whether the French definition has any more or less information than the Hebrew one?
Only if they click through, I guess, but that's true now as well, except now they don't even know if the French definition even exists.
Gerard's interwiki bot has been working well on en: for quite some time now.
- Do they know whether they are perhaps for different senses of the same word? (Did the person who added the definition in Hebrew know enough French to mark it as a translation of a French one?)
The situation is worse with balkanized Wiktionaries. In the endeavour we're in, I am not sure perfection is possible, but things certainly can be improved. If a person adds a translation that doesn't fit in one Wiktionary, it doesn't get seen by many people and it will take a very long time before anybody notices the error.
[snip]
I'm not talking about the separate wiktionaries here, but the UW's feature of linking what is a translation of what... and I'm not talking about errors here either. I mean if, say, the French only defines "financial institution" and the Hebrew only "shore of a river" under "bank". Or if they both have "financial institution" but aren't marked as translations of one another. It's possible that I'm entirely missing the mark as to the features of this feature, but I havnt been introduced to any concrete examples here so all I have is speculation.
All the translations of the content will be found in the sameplace, instead of having to hunt across all the different languageWiktionaries.
Interwiki links do that already, and are flexible enough to handle information that isn't just a translation of prior information.
Take a look at bank in the English, the Dutch and the German dictionaries. The links are flexible, but the pages don't get updated.
Apparently because there aren't the editors to do it.
It's not just about translations. Although, they too, will benefit of only having to be put in once and become available for each and every person using the Wiktionary.
This, frankly, seems impossible, except for words with very specific semantics. I would like to see how this is supposed to be done.
I'll try to give an example, but it'll take some time. Right now I need a way to include the contents of one page into another in the wiki interface. Anyway, please humour me and tell me what's wrong with the following:
- One word/term/expression has a series of meanings
- Each of these meanings has a list of synonyms and translations in
different languages.
- The translations can be given as a list with occasionally a comment
about usage, caveats, etc.
First off, when you say an added translation will be available to each and every user are you talking about a separate translation table for every word, where every French user can see the Hebrew translations of English "you", or a magic table that adheres to "you", "tu", "vous", "o senhor", "o senhora", "אתה", etc. simultaneously? Both seem impractical, for different reasons, but it was the latter I was thinking of.
Regardless: Second, attaching a translation table to every *sense* of a word will be rather excessive, especially when a word has many separate senses and the same word translates several or all of them. Hopefully there will be a kind of information-condensing mechanism?
Thirdly (and this a minor point) I would like to see example sentences accompanying translations. (This minor because eventually the translation's own entry will have an example sentence, hopefully.)
Last, what happens to a translation table when a valid definition that's accumulated translations is split or merged with another? (Like what happened with [[you]].) Presumably some kind of dirty marking would happen to the translations themselves, but if the translation table is attached to the senses themselves, is there some provision for moving it? (It's entirely possible that this is no problem at all, but again, there's no sense of what the interface looks like, though the impression I get from descriptions so far is that this won't be very transparent.)
I'm going to try and make an example of a common word and what it might look like in UW.
Bank seems like a good candidate to do this with.
Or maybe [[You]]. :p
I'll have a look at it, but I wanted a spelling of a word that exists in more than one language.
[[I]] then ;) though that's a quite muddy case.
*Muke!
wiktionary-l@lists.wikimedia.org