Anthere, Jimbo, At this moment, we are in the position of getting the cooperation of all kinds of outside people with regard to Wiktionary. This is documented to a large extend on META and on the wiktionary-l mailinglist. Because of the quality of the data that we are allowed to enter and because of the quality of the persons that are involved, it would really help if we are able to import but also export wiktionary data using XML structures.
The benefits are: *We will open up to other open dictionary content for inclusion in wiktionary. *We will open up our content to other intrested parties. *The wiktionary content will not be only in our own "proprietary" format. *We will be able to import data from one wiktionary in the next. This will not only enhance the quality of the wiktionaries; it will also enhance the reputation of the wiktionaries. *We will prevent the duplication of effort. Much effort now goes in doing the same job over and over again. An example; the word "Nederlands" in nl:wiktionary has 68 exact translations; these words exist as well. Technically these words can be copied to ALL other wiktionaries. This is not possible at the moment. The human effort is huge and wasted as it can be automated.
When a consensus arrives on how to do this, it will also mean that some programming will be required. Without your backing, I expect that nothing is going to happen. When we have arrived at a road map, it will mean that a lot of work will be needed to make the wiktionary data conformant for inclusion in the new scheme and have the technique to get this done. We cannot reasonably ask this of the wiktionary community and the wikitech community when this road map is not co-owned by the wikimedia board.
Questions: *Acknowledge that you will consider this as being strategic to the development of Wiktionary. *Give a timeframe in which we will know that we can plan with a reasonable chance of getting things implemented. (basically when will we have an answer to the first question)
Thanks, Gerard Meijssen (GerardM)
Gerard Meijssen wrote:
Anthere, Jimbo, At this moment, we are in the position of getting the cooperation of all kinds of outside people with regard to Wiktionary. This is documented to a large extend on META and on the wiktionary-l mailinglist. Because of the quality of the data that we are allowed to enter and because of the quality of the persons that are involved, it would really help if we are able to import but also export wiktionary data using XML structures.
The benefits are: *We will open up to other open dictionary content for inclusion in wiktionary. *We will open up our content to other intrested parties. *The wiktionary content will not be only in our own "proprietary" format. *We will be able to import data from one wiktionary in the next. This will not only enhance the quality of the wiktionaries; it will also enhance the reputation of the wiktionaries. *We will prevent the duplication of effort. Much effort now goes in doing the same job over and over again. An example; the word "Nederlands" in nl:wiktionary has 68 exact translations; these words exist as well. Technically these words can be copied to ALL other wiktionaries. This is not possible at the moment. The human effort is huge and wasted as it can be automated.
When a consensus arrives on how to do this, it will also mean that some programming will be required. Without your backing, I expect that nothing is going to happen. When we have arrived at a road map, it will mean that a lot of work will be needed to make the wiktionary data conformant for inclusion in the new scheme and have the technique to get this done. We cannot reasonably ask this of the wiktionary community and the wikitech community when this road map is not co-owned by the wikimedia board.
Questions: *Acknowledge that you will consider this as being strategic to the development of Wiktionary. *Give a timeframe in which we will know that we can plan with a reasonable chance of getting things implemented. (basically when will we have an answer to the first question)
The implications of Gerald's proposal are very unclear, and he has yet to outline them in detail. His focus appears to be on the translation aspects of Wiktionary, but he needs to be reminded that translations are only the secondary objective of the Wiktionaries. The primary objective is to develop a detailed understanding of a language for speakers of that language. Etymology, pronunciation and notes to distinguish different usages all serve that purpose.
Gerard does not make the programming that he wants very clear. Where is at least an example of what he would want a page to look like?
We are already free to exchange material with other sites with compatible licensing. The software that we use is not proprietary is it is available for all to use without the payment of any licensing fees. The comments about enhancing the quality and reputation of the wiktionaries are subjective and speculative. There is no evidence to support this. Before this can be seen as a strategic plan a lot more detail should be given.
Ec
Ray Saintonge wrote:
Gerard Meijssen wrote:
Anthere, Jimbo, At this moment, we are in the position of getting the cooperation of all kinds of outside people with regard to Wiktionary. This is documented to a large extend on META and on the wiktionary-l mailinglist. Because of the quality of the data that we are allowed to enter and because of the quality of the persons that are involved, it would really help if we are able to import but also export wiktionary data using XML structures.
The benefits are: *We will open up to other open dictionary content for inclusion in wiktionary. *We will open up our content to other intrested parties. *The wiktionary content will not be only in our own "proprietary" format. *We will be able to import data from one wiktionary in the next. This will not only enhance the quality of the wiktionaries; it will also enhance the reputation of the wiktionaries. *We will prevent the duplication of effort. Much effort now goes in doing the same job over and over again. An example; the word "Nederlands" in nl:wiktionary has 68 exact translations; these words exist as well. Technically these words can be copied to ALL other wiktionaries. This is not possible at the moment. The human effort is huge and wasted as it can be automated.
When a consensus arrives on how to do this, it will also mean that some programming will be required. Without your backing, I expect that nothing is going to happen. When we have arrived at a road map, it will mean that a lot of work will be needed to make the wiktionary data conformant for inclusion in the new scheme and have the technique to get this done. We cannot reasonably ask this of the wiktionary community and the wikitech community when this road map is not co-owned by the wikimedia board.
Questions: *Acknowledge that you will consider this as being strategic to the development of Wiktionary. *Give a timeframe in which we will know that we can plan with a reasonable chance of getting things implemented. (basically when will we have an answer to the first question)
The implications of Gerald's proposal are very unclear, and he has yet to outline them in detail. His focus appears to be on the translation aspects of Wiktionary, but he needs to be reminded that translations are only the secondary objective of the Wiktionaries. The primary objective is to develop a detailed understanding of a language for speakers of that language. Etymology, pronunciation and notes to distinguish different usages all serve that purpose. Gerard does not make the programming that he wants very clear. Where is at least an example of what he would want a page to look like? We are already free to exchange material with other sites with compatible licensing. The software that we use is not proprietary is it is available for all to use without the payment of any licensing fees. The comments about enhancing the quality and reputation of the wiktionaries are subjective and speculative. There is no evidence to support this. Before this can be seen as a strategic plan a lot more detail should be given.
Ec
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
Ec does indeed not see what I am talking about. I am talking about the ability of wiktionary to import and export data. In META there are now several pages on that subject. It is important to understand that Wiktionary has its data in a proprietary format and that it is not possible to import and export the wiktionary data in a collaborative way. The methodology of exporting would be best served by using XML. There are at least two relevant existing XML definitions and we will propably settle on a third definition; one more suited to dictionaries.
As to not being clear, in META there are two first stabs at database design, one by Polyglot and one by me. Both are not complete and they are there to help start the needed discussion. Etymology, pronounciation, pictures and sound are part of what is proposed.
Proprietary also means that a format is used that does not lend itself to sharing. As such the wiktionary content is closed. The fact that data can not be re-used in sister-wiktionaries is all the proof needed that a lot of redundant work is done. To make the wiktionaries relevant we need content in our wiktionaries. Nl.wiktionary has 3600 articles and the 45000 pages in en:wiktionary cannot be added to it. The en: content is relevant content to the nl:wiktionary and any other wiktionary. There are 5200 words in the GEMET glossary with translations and we do not have a standard method to include these in our wiktionaries. We do not have a method to let the GEMET people know what changes are made to their data. Those are the strategic requirements I am talking about. The need to share the wiktionary content is strategic; it makes our data open and it allows for the sharing of work that is done in the wiktionaries. By sharing, all the work done becomes more relevant.
How a finished page will eventually look, is to early to tell, it may look like the way [[wikt:nl:Engels]] and [[wikt:nl:English]] looks. The edit screens will however be completely different.
As to a purpose of a dictionary; it should help people understand a language; it is not only for the speakers of a language. The en:wiktionary helps people who can use the English language to understand the English language and other languages as well.
Thanks, GerardM
Sorry, this week is full of work and I won't have too much time to dedicate before the week end.
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all. But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries, but does it make sense that a work needs to be repetead for every language again and again? Do we really have that much time to spend? Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets. The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries. I agree, wiktionary is open content, but contributing people are working - and 30 hours are almost a week of work ... how much does this cost? where's the break even point for hours invested for programming and hours invested to let's say only 5000 terms? We are talking about computers and software of the second millennium not about the good old Zuse.
Please don't become upset now: but we should try to make things with a certain criteria from the beginning on - it is much harder to do it afterwards when there's a lot of data inserted and the need is definitely there. To my experience wiktionary is going to be used by at least 90% of the people like a dictionary - to search a word in a certain language.
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent.
The next thing is: I now can pass e.g. the colours list to Wiktonary, but I cannot retrieve an animals list with all its translations - so it's a one-way direction - to work with many other projects both directions are needed. So many of us who now concentrate on their own projects probably would say: I pass all my data to wiktionary as I get something back. Many of these listings, definitions etc. are from people in the language industry - and people of the language industry most of all search for the right term.
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
Now who's going to kill me ;-) ??
Ciao, Sabine
-----
Sabine Cretella s.cretella@wordsandmore.it www.wordsandmore.it Meetingplace for translators www.wesolveitnet.com
Sabine Cretella wrote:
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
If a high-school student was using Wiktionary for German class, it would be kind of useless to send her over to read a definition of "Deutsch" written *in* German, eh? Not all words translate mechanically either, consider "Ordnung" for instance; you need an explanation in English of its subtleties, not in a language with whose subtleties you are unfamiliar.
Stan
Stan Shebs wrote:
Sabine Cretella wrote:
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
If a high-school student was using Wiktionary for German class, it would be kind of useless to send her over to read a definition of "Deutsch" written *in* German, eh? Not all words translate mechanically either, consider "Ordnung" for instance; you need an explanation in English of its subtleties, not in a language with whose subtleties you are unfamiliar.
Stan
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
Stan, You are completely correct. All wiktioanaries can and should have all words in all languages, the definition of the meanings of the words should be in the language native to the wiktionary. This may be confusing at first, but once you understand the concept it makes perfect sense. Thanks, GerardM
Stan, You are completely correct. All wiktioanaries can and should have all words in all languages, the definition of the meanings of the words should be in the language native to the wiktionary. This may be confusing at first, but once you understand the concept it makes perfect sense. Thanks, GerardM
Gerard, please don't become upset now: we can understand the concept and many others as well, but the majority will not - I sent some of my colleagues to wiktionary and a first reaction was that they felt not "at home" because languages are "mixed up". And there's a lot of these people around and it is very unlikely that people go to a place where they don't feel at home, but we need those people to have them understand that even their help is needed and that wiktionary indeed is more than just another useful site where to find terms.
Ciao, Sabine
Sabine Cretella wrote:
Stan, You are completely correct. All wiktioanaries can and should have all words in all languages, the definition of the meanings of the words should be in the language native to the wiktionary. This may be confusing at first, but once you understand the concept it makes perfect sense. Thanks, GerardM
Gerard, please don't become upset now: we can understand the concept and many others as well, but the majority will not - I sent some of my colleagues to wiktionary and a first reaction was that they felt not "at home" because languages are "mixed up". And there's a lot of these people around and it is very unlikely that people go to a place where they don't feel at home, but we need those people to have them understand that even their help is needed and that wiktionary indeed is more than just another useful site where to find terms.
Ciao, Sabine
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
I am not upset. I can sympathise. I expect that at some stage it will be possible to specify in the user preferences for wiktionary: English words only; translations in French, German and Farsi.
The most important thing is that we will get data exchange enabled; have a standard XML format that everybody speaks so that it will not matter what application you use, you will have quality data with all the trimmings.
I do agree that we need all the quality data in the wiktionaries. By increasing the amount of words and the quality of the entries more people will use wiktionary and they will start to feel at home.
Thanks, Gerard
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
If a high-school student was using Wiktionary for German class, it would be kind of useless to send her over to read a definition of "Deutsch" written *in* German, eh? Not all words translate mechanically either, consider "Ordnung" for instance; you need an explanation in English of its subtleties, not in a language with whose subtleties you are unfamiliar.
So the definition should be together with the English translation of the term "Ordnung" and not a single term where you can't look up the alternatives at once or see below. Again something that can be achieved easily with databases and then of course uploaded accordingly to the wiktionary.
e.g. German-English
Ordnung f (-; no pl)
allg. order; Ordentlichkeit: order(liness), tidiness; Vorschriften: rules pl, regulations pl; Anordnung: arrangement; System: system, set-up; Rang: class: in Ordnung all right; tech. etc in (good) order;
in Ordnung bringen put right (a. fig.); Zimmer etc: tidy up; reparieren: repair, fam. fix (a. fig.); (in) Ordnung halten keep (in) order; et. ist nicht in Ordnung (mit) there is s.th. wrong (with).
so every singles meaning or definition or whatever needs to stay with its translation
Now let's assume that Ordnung has just 5 possible translations into other languages (this is easier to calculate) and there fore if we have the translation into 30 languages we would have approximately 150 different terms in different languages right? So when you look at the alphabetical word list you will have a completely mixed up list. the only possibilty would then be: create separate lists for separate languages - this would not mix up things. E.g. in the German wiktionary there wouldn't be "regulations" in the German index, but in a separate one. Mixing all languages together in one huge pool is not user friendly at all - what about terms in languages with other alphabets? Do you feel people would not become a bit annoyed if, when having a look at the present terms, they find words in arabic, hebrew, russian, chinese, japanese, korean etc.?
That's what I dislike on the actual way of organising terms - you find every kind of writing in there - OK, on the English page there are the words beginning with other scripts. But how about the student studying German and going to the German wiktionary he finds "Anguille" - at a first glance if he/she knows french he/she could think that the french term is used in German as well - and this will happen - people are lazy - don't think about how we use the tools since we know them, but think about people who don't take much time - what then happens is that the student is being corrected the term and the end is: wiktionary is not a good ressource even if the fault is not wiktionary's fault, but the student's fault. He'll then talk with his friends etc. etc ... what comes out of this.
So there are two possibilities: or foreign terms are identified as such by colour/symbol/language code in the listing or there are subsections for the single listings in different languages.
Hmmm ... I hope this is understandable - I was just thinking about my pupils at school (when I was still teaching) - using the dictionary even if you explain over and over again 80% of them just take the first translation they found as they don't like reading all the context. I'd like to avoid misunderstandings - that's all. For myself I have no problems with mixed up lists - working with languages every day it is normal, but we should think about those who potentially would have difficulties.
Ciao, Sabine
(p.s. and now I'll read the rest of the mails)
Sabine Cretella wrote:
That's what I dislike on the actual way of organising terms - you find every kind of writing in there - OK, on the English page there are the words beginning with other scripts. But how about the student studying German and going to the German wiktionary he finds "Anguille" - at a first glance if he/she knows french he/she could think that the french term is used in German as well - and this will happen - people are lazy - don't think about how we use the tools since we know them, but think about people who don't take much time - what then happens is that the student is being corrected the term and the end is: wiktionary is not a good ressource even if the fault is not wiktionary's fault, but the student's fault. He'll then talk with his friends etc. etc ... what comes out of this.
I don't think that we need to worry about all the stupid things that a user can do. If a word has 5 meanings one of them needs to be listed first. It's the user's duty to use common sense in making his choice. On Wikisource we had someone translate the Sherlock Holmes title "His Last Bow" into French as "Son dernier coup d'archet". It was the completely wrong meaning of "bow". Reasonable people do not hold a company responsible for the supidities of its users, except in the United States when they feel they can earn a fortune by startingf a lawsuit. :-)
So there are two possibilities: or foreign terms are identified as such by colour/symbol/language code in the listing or there are subsections for the single listings in different languages.
Each page is divided into separate sections for each language represented. Having the English and French for "sensible" on the same page helps in recognizing that they are false friends. I like to approach the problem from the perspective of the user who has a piece of text in front of him without being certain even of what is the language of the text. That is a fascinating challenge.
Hmmm ... I hope this is understandable - I was just thinking about my pupils at school (when I was still teaching) - using the dictionary even if you explain over and over again 80% of them just take the first translation they found as they don't like reading all the context. I'd like to avoid misunderstandings - that's all. For myself I have no problems with mixed up lists - working with languages every day it is normal, but we should think about those who potentially would have difficulties.
No vaccination exists that will prevent infectiously stupid behaviour. Are you familiar with the Scott Adams cartoon strip "Dilbert"? See the archives at http://www.dilbert.com/comics/dilbert/archive/dilbert-20040809.html :-) Ec
Sabine Cretella wrote:
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all. But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
I looked at his pdf file and found it pretty. My browser would not open the other one; I don't know what software is needed for that.
The first thing that we need to address is the purpose and philosophy of a dictionary. To me, a dictionary is primarily descriptive, and not prescriptive. The multi-volume Oxford English Dictionary is a model to be followed. An ideal Wiktionary article will include a series of definitions that trace the history of the word over the centuries. The etymology is important for understanding these uses. I tend to view translations as a form of synonym which happens to be in another language. They are guide lines which the wise translator will consider but will not consider binding. I would not depend on BabelFish to give me a good Italian translation of Shakespeare or a good English translation of Dante.
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries, but does it make sense that a work needs to be repetead for every language again and again? Do we really have that much time to spend? Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets. The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries. I agree, wiktionary is open content, but contributing people are working - and 30 hours are almost a week of work ... how much does this cost? where's the break even point for hours invested for programming and hours invested to let's say only 5000 terms? We are talking about computers and software of the second millennium not about the good old Zuse.
A volunteer project does not usually need to think in terms of financial break even points. Nobody is getting paid. :-) Better results depend on what you mean by that term.
Please don't become upset now: but we should try to make things with a certain criteria from the beginning on - it is much harder to do it afterwards when there's a lot of data inserted and the need is definitely there. To my experience wiktionary is going to be used by at least 90% of the people like a dictionary - to search a word in a certain language.
We "have" been working on criteria from the beginning! You're right that most people will use Wiktionary to look up a word. That's what it's for.
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
This was intentional. English speakers will gain an insight into the words of other languages.
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent.
So what? We're not looking for mere glossaries.
The next thing is: I now can pass e.g. the colours list to Wiktonary, but I cannot retrieve an animals list with all its translations - so it's a one-way direction - to work with many other projects both directions are needed. So many of us who now concentrate on their own projects probably would say: I pass all my data to wiktionary as I get something back. Many of these listings, definitions etc. are from people in the language industry - and people of the language industry most of all search for the right term.
I have not followed the issue of colours or animals, so I can't comment fully, though it would seem that for the latter the best thing would be to relate the vernacular names to the Latin binomials. The fact that a term comes from the "language industry" should not make the proposals of others less valid. Contributions from either source are open to discussion and modification.
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
We will be more able to discuss the proposal when we see what it looks like.
Now who's going to kill me ;-) ??
Huh! :-* No problem. Va bene.
Ec
Ray Saintonge wrote:
Sabine Cretella wrote:
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all. But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
I looked at his pdf file and found it pretty. My browser would not open the other one; I don't know what software is needed for that.
The second file is in Open Office format. :)
The first thing that we need to address is the purpose and philosophy of a dictionary. To me, a dictionary is primarily descriptive, and not prescriptive. The multi-volume Oxford English Dictionary is a model to be followed. An ideal Wiktionary article will include a series of definitions that trace the history of the word over the centuries. The etymology is important for understanding these uses. I tend to view translations as a form of synonym which happens to be in another language. They are guide lines which the wise translator will consider but will not consider binding. I would not depend on BabelFish to give me a good Italian translation of Shakespeare or a good English translation of Dante.
Describing it to be like "the multi-volume Oxford English Dictionary" is as helpfull as the "Dikke van Dale"; both are emminent dead wood dictionaries, and therefore there should be differences with wiktionary as well. If you propose we follow OED explain what makes it so relevant to you describe the features that are important to you. I do not have an OED, I have a van Dale, it is as authoritive and I am sure it has the same features but I do not know. Van Dale does a NE and EN dictionary as well, so models for translation dictionaries exist
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries, but does it make sense that a work needs to be repetead for every language again and again? Do we really have that much time to spend? Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets. The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries. I agree, wiktionary is open content, but contributing people are working - and 30 hours are almost a week of work ... how much does this cost? where's the break even point for hours invested for programming and hours invested to let's say only 5000 terms? We are talking about computers and software of the second millennium not about the good old Zuse.
A volunteer project does not usually need to think in terms of financial break even points. Nobody is getting paid. :-) Better results depend on what you mean by that term.
A volunteer project that does not value the time spent by the volunteers is not worth the time. Time of quality people is a scarce resource. Is this your argument against re-using content in nl:wiktionary for data from en:wiktionary ??
Please don't become upset now: but we should try to make things with a certain criteria from the beginning on - it is much harder to do it afterwards when there's a lot of data inserted and the need is definitely there. To my experience wiktionary is going to be used by at least 90% of the people like a dictionary - to search a word in a certain language.
We "have" been working on criteria from the beginning! You're right that most people will use Wiktionary to look up a word. That's what it's for.
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
This was intentional. English speakers will gain an insight into the words of other languages.
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent.
So what? We're not looking for mere glossaries.
When you do not know what these glossaries contain, so how can you judge their value to wiktionary. Glossaries add content and with a wiktionary it is like a telephone network; the more people who have a telephone the more valuable the network is. With so many words not yet described, glossaries may fill a need.
The next thing is: I now can pass e.g. the colours list to Wiktonary, but I cannot retrieve an animals list with all its translations - so it's a one-way direction - to work with many other projects both directions are needed. So many of us who now concentrate on their own projects probably would say: I pass all my data to wiktionary as I get something back. Many of these listings, definitions etc. are from people in the language industry - and people of the language industry most of all search for the right term.
I have not followed the issue of colours or animals, so I can't comment fully, though it would seem that for the latter the best thing would be to relate the vernacular names to the Latin binomials. The fact that a term comes from the "language industry" should not make the proposals of others less valid. Contributions from either source are open to discussion and modification.
Contributions from any source are open to discussion; at present we discuss opening up the wiktionary content so that it can be shared in all wiktionaries. At this moment, nl: does contribute easily to en:wiktionary while it is hard to use the en:wiktionary content. Is that not a waste of your time; the work you have done could be so much more valuable !
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
We will be more able to discuss the proposal when we see what it looks like.
At this point we discuss the merites of opening up the wiktionary content and use standard XML formats that are/will be shared among electronic dictionaries. When we agree that there is a need, from that a lot of changes to the wiktionary content will follow. The first thing is to agree on the merits of opening up the content. We must first agree on this, this is part of our mission as I see it. Certainly we will not have less data, we will have data in a proper place that will enable open content.
Now who's going to kill me ;-) ??
Huh! :-* No problem. Va bene.
Ec
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
Thanks, Gerard
Thank you Gerard, you answered most of the questions - I'll just add some more info to single points here.
Ray Saintonge wrote:
Sabine Cretella wrote:
The first thing that we need to address is the purpose and philosophy of a dictionary. To me, a dictionary is primarily descriptive, and not prescriptive. The multi-volume Oxford English Dictionary is a model to be followed. An ideal Wiktionary article will include a series of definitions that trace the history of the word over the centuries. The etymology is important for understanding these uses. I tend to view translations as a form of synonym which happens to be in another language. They are guide lines which the wise translator will consider but will not consider binding. I would not depend on BabelFish to give me a good Italian translation of Shakespeare or a good English translation of Dante.
Describing it to be like "the multi-volume Oxford English Dictionary" is as helpfull as the "Dikke van Dale"; both are emminent dead wood dictionaries, and therefore there should be differences with wiktionary as well. If you propose we follow OED explain what makes it so relevant to you describe the features that are important to you. I do not have an OED, I have a van Dale, it is as authoritive and I am sure it has the same features but I do not know. Van Dale does a NE and EN dictionary as well, so models for translation dictionaries exist
Of course this is OK, but there are definitions + translations in there - now: if someone doesn't have the time to add the whole definition, but only a list of valuable translations this should be possible. The definitions can then be integrated in a second stage. In any case, to ba able to create a huge Oxford Dictionary project you need first of all a list of terms and then someone who takes the time to look up everything or a linguist who already knows everything - both of them need their time to do it and it is much easier to work step by step on existing terms than having to create term lists from scratch.
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries, but does it make sense that a work needs to be repetead for every language again and again? Do we really have that much time to spend? Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets. The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries. I agree, wiktionary is open content, but contributing people are working - and 30 hours are almost a week of work ... how much does this cost? where's the break even point for hours invested for programming and hours invested to let's say only 5000 terms? We are talking about computers and software of the second millennium not about the good old Zuse.
A volunteer project does not usually need to think in terms of financial break even points. Nobody is getting paid. :-) Better results depend on what you mean by that term.
A volunteer project that does not value the time spent by the volunteers is not worth the time. Time of quality people is a scarce resource. Is this your argument against re-using content in nl:wiktionary for data from en:wiktionary ??
I agree very much to this one - we don't have much time since we need to work to earn a living and so it is crucial to us that work done once can be used twice or more times.
Please don't become upset now: but we should try to make things with a certain criteria from the beginning on - it is much harder to do it afterwards when there's a lot of data inserted and the need is definitely there. To my experience wiktionary is going to be used by at least 90% of the people like a dictionary - to search a word in a certain language.
We "have" been working on criteria from the beginning! You're right that most people will use Wiktionary to look up a word. That's what it's for.
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc. When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
This was intentional. English speakers will gain an insight into the words of other languages.
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent.
So what? We're not looking for mere glossaries.
When you do not know what these glossaries contain, so how can you judge their value to wiktionary. Glossaries add content and with a wiktionary it is like a telephone network; the more people who have a telephone the more valuable the network is. With so many words not yet described, glossaries may fill a need.
We started with a very "simple" glossary - colours - it seemed simple at the beginning - now it is becoming a huge project. We started with 20 colour names and now there are 271 - they are going to grow as now, at this stage I'll add the English colour names present on wiktionary.org then those of the other languages I understand (German, French, Spanish, Portuguese etc.).
When having completed as much as possible with as many languages as possible my aim is to ask colleagues to integrate as much as possible in any present language.
The next step is then to include all this work (and it is quite a lot of work be sure) into all existing wiktionaries. But how to do this? The original approach is "one page for every colour adding manually translations". Now we have the templates and so we can create the entrances immediately without problems and then insert (still paste) them in the single wiktionaries. But this is not the way to go - it should be possibile (e.g. via xml and MySql) to recall these terms in order to avoid typos and errors. For now this is not possible - you cannot tell someone: could you please have a look at all Chinese entrances in this list and then just update with the corrections and all pages are automatically updated. This is the way to go - it is the only way to assure a certain quality.
Other possible glossaries that are going to be built up step by step - legislation, machinery, edp, etc. Just take "screws" - can you imagine how many of those tiny things exist and make us translators get mad as often engineers mix up terms themselves? Imagine what I am translating right now: an English to German translation of a machinery manual, but the English version was written by an Italian - so it happens to me that I first need to retranslate from EN to IT and then from there to DE to understand and to find the right term - he obviously sometimes used just the first term he found in a dictionary. These highly specialistic glossaries are needed and having a place where to store not only the different translations of a term, but also definitions etc. this is indeed the best thing we can imagine. That's why I feel that our glossary project can give support to wiktionary passing the tables where others can then think about the definitions since terms are already there.
BMW: - they have a glossary English-German with definitions on their website - wouldn't it be great to convince them to donate it to Open Content? There's so much work that has already been done - we should try to re-use it as a basis.
Many companies have their own glossaries and partly with definitions.
The next thing is: I now can pass e.g. the colours list to Wiktonary, but I cannot retrieve an animals list with all its translations - so it's a one-way direction - to work with many other projects both directions are needed. So many of us who now concentrate on their own projects probably would say: I pass all my data to wiktionary as I get something back. Many of these listings, definitions etc. are from people in the language industry - and people of the language industry most of all search for the right term.
I have not followed the issue of colours or animals, so I can't comment fully, though it would seem that for the latter the best thing would be to relate the vernacular names to the Latin binomials. The fact that a term comes from the "language industry" should not make the proposals of others less valid. Contributions from either source are open to discussion and modification.
Contributions from any source are open to discussion; at present we discuss opening up the wiktionary content so that it can be shared in all wiktionaries. At this moment, nl: does contribute easily to en:wiktionary while it is hard to use the en:wiktionary content. Is that not a waste of your time; the work you have done could be so much more valuable !
see above :-)
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
We will be more able to discuss the proposal when we see what it looks like.
At this point we discuss the merites of opening up the wiktionary content and use standard XML formats that are/will be shared among electronic dictionaries. When we agree that there is a need, from that a lot of changes to the wiktionary content will follow. The first thing is to agree on the merits of opening up the content. We must first agree on this, this is part of our mission as I see it. Certainly we will not have less data, we will have data in a proper place that will enable open content.
You will have much more data if this happens - why? I can, for example, take a list of terms with all its translations (maybe also include the definition), add some parts myself and ask colleagues for help to integrate and add other specific material. So opening the content will attract further content.
Ciao, Sabine
Gerard Meijssen wrote:
Ray Saintonge wrote:
Sabine Cretella wrote:
But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables ...
I looked at his pdf file and found it pretty. My browser would not open the other one; I don't know what software is needed for that.
The second file is in Open Office format. :)
Thanks. I've been informed that the two are just different ways of presenting the same thing, so now I don't really need to look at both.
The first thing that we need to address is the purpose and philosophy of a dictionary. To me, a dictionary is primarily descriptive, and not prescriptive. The multi-volume Oxford English Dictionary is a model to be followed. An ideal Wiktionary article will include a series of definitions that trace the history of the word over the centuries. The etymology is important for understanding these uses. I tend to view translations as a form of synonym which happens to be in another language. They are guide lines which the wise translator will consider but will not consider binding. I would not depend on BabelFish to give me a good Italian translation of Shakespeare or a good English translation of Dante.
Describing it to be like "the multi-volume Oxford English Dictionary" is as helpfull as the "Dikke van Dale"; both are emminent dead wood dictionaries, and therefore there should be differences with wiktionary as well. If you propose we follow OED explain what makes it so relevant to you describe the features that are important to you. I do not have an OED, I have a van Dale, it is as authoritive and I am sure it has the same features but I do not know. Van Dale does a NE and EN dictionary as well, so models for translation dictionaries exist
My apologies. I think every language has its great dictionaries that go into great detail about the language. The key feature about the OED is the many quotes that it gives to show how a word has been used and has evolved over the years. I would be pleased if we could show how the language has evolved over the last century since then. Not only that, I envision a similar treatment for all languages.
A volunteer project does not usually need to think in terms of financial break even points. Nobody is getting paid. :-) Better results depend on what you mean by that term.
A volunteer project that does not value the time spent by the volunteers is not worth the time. Time of quality people is a scarce resource. Is this your argument against re-using content in nl:wiktionary for data from en:wiktionary ??
I said nothing about re-using the nl-wiktionary material. Bolunteers work on such a project when they have time. Sometimes that means that there may be delays that would be intolerable if someone was paying them for the work.
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent. So what? We're not looking for mere glossaries.
When you do not know what these glossaries contain, so how can you judge their value to wiktionary. Glossaries add content and with a wiktionary it is like a telephone network; the more people who have a telephone the more valuable the network is. With so many words not yet described, glossaries may fill a need.
Yes. Glossaries are one way to start an article.
Contributions from any source are open to discussion; at present we discuss opening up the wiktionary content so that it can be shared in all wiktionaries. At this moment, nl: does contribute easily to en:wiktionary while it is hard to use the en:wiktionary content. Is that not a waste of your time; the work you have done could be so much more valuable !
Of course I support the sharing of content. The problem you are trying to solve here is for me one that does not exist.
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
We will be more able to discuss the proposal when we see what it looks like.
At this point we discuss the merites of opening up the wiktionary content and use standard XML formats that are/will be shared among electronic dictionaries. When we agree that there is a need, from that a lot of changes to the wiktionary content will follow. The first thing is to agree on the merits of opening up the content. We must first agree on this, this is part of our mission as I see it. Certainly we will not have less data, we will have data in a proper place that will enable open content.
Perhaps you should go ahead and do your XML coding. Once this is done we will all be able to experiment with it to see how it works on the test wiki. It will be a lot easier to come to an informed judgement at that point.
Ec
Ray Saintonge schreef:
Sabine Cretella wrote:
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all. But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
I looked at his pdf file and found it pretty. My browser would not open the other one; I don't know what software is needed for that.
The first thing that we need to address is the purpose and philosophy of a dictionary. To me, a dictionary is primarily descriptive, and not prescriptive. The multi-volume Oxford English Dictionary is a model to be followed. An ideal Wiktionary article will include a series of definitions that trace the history of the word over the centuries. The etymology is important for understanding these uses. I tend to view translations as a form of synonym which happens to be in another language. They are guide lines which the wise translator will consider but will not consider binding. I would not depend on BabelFish to give me a good Italian translation of Shakespeare or a good English translation of Dante.
Hi Eclecticology,
Maybe you will feel better about my proposal for a database schema, if you know that I also consider a translation as a synonym that happens to be in a different language. They are connected through the Meanings table. If you query for words in the same language, you will get synonyms. If you query for words in a different language you will get translations. It is true that I have concentrated on the translations and other kinds of relations between words. This has always been my primary purpose for a dictionary. But there certainly is room for descriptions and the etymologies, they can easily be added as field to the WordDescriptions tables. The entire setup is also meant as a way to help translators (translation memory) or even as a base for machine translation.
The biggest advantage is that entries of how words relate to each other between languages only need to be given once. The descriptions will still need to be given and translated many times though, but it will become a lot easier to do so, because the database will know to which words they belong.
In a user interface it will be possible to select the interface language, but also and maybe more importantly, it will be possible to indicate ad-hoc or in a user profile which languages are of interest. So you could have the dictionary present itself as a descriptive dictionary. Somebody else can have it work as a translation dictionary between as many languages as they are interested in. This will result in pages that don't have to be cluttered with information that people don't use anyway and that stands in the way of what they were really trying to look up.
And of course it will become possible for people to write their own interfaces, so it becomes a translation memory. It could also become a distributed project, but we aren't there yet. So let's first discuss whether it is feasible for the WikiMedia foundation to set up a database with the necessary tables and then to create a php interface for it, with help of the community, of course.
Jo
PS: .sxd is a format that can be openede with OpenOffice.org. The content is the same as the PDF though. It is there for people who would want to edit it.
--- Sabine Cretella sabine_cretella@yahoo.it wrote:
Sorry, this week is full of work and I won't have too much time to dedicate before the week end.
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all.
I still think even this much is a bad idea. Copy & paste in every arena always makes it quicker and easier to make mistakes. The only time you can copy all the translations from word A in language X to word A' in language Y is when the sense and subtleties are *identical*. It will not work for "mouse" or "rat" into many languages. Just thinking about it for "moose" or "elk" is painful. Just because "mouse" in English can be translated into a/b/c/d in languages W/X/Y/Z, does not mean that "mus" in "language W" can also be translated into b/c/d in languages X/Y/Z. Every single one must be checked. Copying and pasting is the opposite of checking!
But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
Can you point to these tables please?
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries, but does it make sense that a work needs to be repetead for every language again and again?
Yes. Because every language is different. If it wasn't we'd just use Babelfish and the results would always be perfect.
Do we really have that much time to spend?
Whether or not we have the time to spend, a good quality dictionary requires time spent. If we choose not to spend that time we will not have a good quality dictionary. Ask any professional lexicographer. The OED, Websters, Le Grand Robert, you name it - all the big, quality dictionares took a long time. That is the nature of the game.
Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets.
You will find they still do it this way. They use computers now but the work is still painstaking and precise and slow. Dictionary building is not a race.
The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries.
I can guarantee if you avoid half the work you will double the errors.
I agree, wiktionary is open content, but contributing people are working - and 30 hours are almost a week of work ... how much does this cost? where's the break even point for hours invested for programming and hours invested to let's say only 5000 terms? We are talking about computers and software of the second millennium not about the good old Zuse.
All you are doing is labouring a non-sequitor. Every- body agrees that time is both scarce and valuable. I do not agree that taking shortcuts and rushing result in better dictionaries. Anybody who has used a poor / cheap translating dictionary must know this already.
Please don't become upset now: but we should try to make things with a certain criteria from the beginning on - it is much harder to do it afterwards when there's a lot of data inserted and the need is definitely there. To my experience wiktionary is going to be used by at least 90% of the people like a dictionary - to search a word in a certain language.
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc.
Woah there! Are you not aware that Wiktionary is a dictionary of all languages? It is both a definition dictionary and a translating dictionary. People whose first language is not English will be aided in their understanding of English definitions for English words when they can also see the glosses of that word in their own language.
Now I do believe that in the future we should have a way to "show only relevant information". Users should be able to choose whether they want to see the translations or not. At the moment the Wiktionary is small, the developers time is scarce, the structure is not robust. So this will happen later.
When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
Then the user must be able to read German to find out what the word means! This is like suggesting an English-only dictionary and a German-only dictionary would be sufficient for a monolingual to translate between the two languages!
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent.
The next thing is: I now can pass e.g. the colours list to Wiktonary, but I cannot retrieve an animals list with all its translations - so it's a one-way direction - to work with many other projects both directions are needed. So many of us who now concentrate on their own projects probably would say: I pass all my data to wiktionary as I get something back. Many of these listings, definitions etc. are from people in the language industry - and people of the language industry most of all search for the right term.
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
One "problem" I'm beginning to see for the future is that currently Wiktionary is both the lexicographer's index cards and the finished dictionary in one place. Especially as it grows, the "finished" articles are mixed right in with the under-research articles. I don't think it's a problem yet but it will be in a couple of years and I'm not sure what kind of solution might exist.
Andrew Dunbar (hippietrail)
Now who's going to kill me ;-) ??
Ciao, Sabine
Sabine Cretella s.cretella@wordsandmore.it www.wordsandmore.it Meetingplace for translators www.wesolveitnet.com
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
===== http://linguaphile.sf.net/cgi-bin/translator.pl http://www.abisource.com
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com
Andrew Dunbar wrote:
--- Sabine Cretella sabine_cretella@yahoo.it wrote:
Sorry, this week is full of work and I won't have too much time to dedicate before the week end.
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all.
I still think even this much is a bad idea. Copy & paste in every arena always makes it quicker and easier to make mistakes. The only time you can copy all the translations from word A in language X to word A' in language Y is when the sense and subtleties are *identical*. It will not work for "mouse" or "rat" into many languages. Just thinking about it for "moose" or "elk" is painful. Just because "mouse" in English can be translated into a/b/c/d in languages W/X/Y/Z, does not mean that "mus" in "language W" can also be translated into b/c/d in languages X/Y/Z. Every single one must be checked. Copying and pasting is the opposite of checking!
We are not talking about copy and paste; that would not be helpfull. When a meaning does not have a word or phrase that is the literal translation, there is no translation in that other language. This does not mean that this meaning cannot be defined. When translations are not the same, they are not. But this is not what is at issue, at issue is opening up data to both other wiktionaries and other electronic dictionaries.
But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
Can you point to these tables please?
The discussion and papers are on Meta.
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries, but does it make sense that a work needs to be repetead for every language again and again?
<> Yes. Because every language is different. If it wasn't we'd just use Babelfish and the results would always be perfect.
When the word "applepie" gets its entry in nl:wiktionary, would it not be fair to say that the en:wiktionary content for that word is relevant ? Would it not make sense to use all the content with some translations where needed?
<>Do we really have that much time to spend?
Whether or not we have the time to spend, a good quality dictionary requires time spent. If we choose not to spend that time we will not have a good quality dictionary. Ask any professional lexicographer. The OED, Websters, Le Grand Robert, you name it - all the big, quality dictionares took a long time. That is the nature of the game.
Yes, but contrary to Websters, OED, van Dale, we are not in the business of making money out of it. We are in the business of making open content and we can and should accept open content contributions from others. It is the aim of the wiktionary to have open content; and that is where we fail at this moment.
Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets.
You will find they still do it this way. They use computers now but the work is still painstaking and precise and slow. Dictionary building is not a race.
When the en:wiktionary content were open, I would be able to re-use that content in nl:wiktionary at this stage we cannot. Am I to believe all this work is not relevant to nl:
The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries.
<> I can guarantee if you avoid half the work you will double the errors.
What work are you talking about: does {{en}} not translate well between the wiktionaries ?
<>I agree, wiktionary is open content, but contributing people are working - and 30 hours are almost a week of work ... how much does this cost? where's the break even point for hours invested for programming and hours invested to let's say only 5000 terms? We are talking about computers and software of the second millennium not about the good old Zuse.
<> All you are doing is labouring a non-sequitor. Every- body agrees that time is both scarce and valuable. I do not agree that taking shortcuts and rushing result in better dictionaries. Anybody who has used a poor / cheap translating dictionary must know this already
<>Please don't become upset now: but we should try to make things with a certain criteria from the beginning on - it is much harder to do it afterwards when there's a lot of data inserted and the need is definitely there. To my experience wiktionary is going to be used by at least 90% of the people like a dictionary - to search a word in a certain language.
E.g. what disturbes a bit is that foreign language words are "seen" in the wiktionary. This is confusing. German should only give German terminology in its lists, English the English one, Dutch the Dutch one etc.
Woah there! Are you not aware that Wiktionary is a dictionary of all languages? It is both a definition dictionary and a translating dictionary. People whose first language is not English will be aided in their understanding of English definitions for English words when they can also see the glosses of that word in their own language.
Now I do believe that in the future we should have a way to "show only relevant information". Users should be able to choose whether they want to see the translations or not. At the moment the Wiktionary is small, the developers time is scarce, the structure is not robust. So this will happen later.
When I am on the English wiktionary and find "Deutsch" as a single page this is not logic to me - the term "Deutsch" should be linked to the German wiktionary instead and should not appear in the listing under the letter "D".
Then the user must be able to read German to find out what the word means! This is like suggesting an English-only dictionary and a German-only dictionary would be sufficient for a monolingual to translate between the two languages!
All of what you ask for is needed, but there are already plenty of glossaries that could be uploaded and I am sure, if we have the techniques many companies would agree to pass their glossaries with definitions to OpenContent.
The next thing is: I now can pass e.g. the colours list to Wiktonary, but I cannot retrieve an animals list with all its translations - so it's a one-way direction - to work with many other projects both directions are needed. So many of us who now concentrate on their own projects probably would say: I pass all my data to wiktionary as I get something back. Many of these listings, definitions etc. are from people in the language industry - and people of the language industry most of all search for the right term.
So: in any case I support the XML and database way - it will take some time to develop it, but once it is there wiktionary has all the characteristics needed to become the main reference tool for people working with languages and people being interested in languages. The more people can use it the more will contribute. Gerard, Polyglot and whoever is convinced about this way: please don't stop going. You are on the right way.
One "problem" I'm beginning to see for the future is that currently Wiktionary is both the lexicographer's index cards and the finished dictionary in one place. Especially as it grows, the "finished" articles are mixed right in with the under-research articles. I don't think it's a problem yet but it will be in a couple of years and I'm not sure what kind of solution might exist.
Andrew Dunbar (hippietrail)
When the data is in a databaseformat, it would become more obvious you would just be able te query for words without etymplogy or pronounciation and tacle them. This functionality will come with better structured content, this structure will result from the necessary work to enable import and export to the wiktionaries.
Now who's going to kill me ;-) ??
Ciao, Sabine
Sabine Cretella s.cretella@wordsandmore.it www.wordsandmore.it Meetingplace for translators www.wesolveitnet.com
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org
Thanks, Gerard
--- Gerard Meijssen gerardm@myrealbox.com wrote:
Andrew Dunbar wrote:
--- Sabine Cretella sabine_cretella@yahoo.it
wrote:
Sorry, this week is full of work and I won't have too much time to dedicate before the week end.
Referring to the posts of Ray and Gerard: my problem was and still is (but much less now) how to make e.g. multilingual ressources of my project on sourceforge.net available to wiktionary. At the moment we are talking about how to be able to pass translations of words easily from one wiktionary to the other as this part is the easiest of all.
I still think even this much is a bad idea. Copy & paste in every arena always makes it quicker and easier to make mistakes. The only time you can copy all the translations from word A in language X to word A' in language Y is when the sense and subtleties are *identical*. It will not work for "mouse" or "rat" into many languages. Just thinking about it for "moose" or "elk" is painful. Just because "mouse" in English can be translated into a/b/c/d in languages W/X/Y/Z, does not mean that "mus" in "language W" can also be translated into b/c/d in languages X/Y/Z. Every single one must be checked. Copying and pasting is the opposite of checking!
We are not talking about copy and paste; that would not be helpfull.
But further along you specifically mention cut and paste. I don't understand.
When a meaning does not have a word or phrase that is the literal translation, there is no translation in that other language. This does not mean that this meaning cannot be defined. When translations are not the same, they are not. But this is not what is at issue, at issue is opening up data to both other wiktionaries and other electronic dictionaries.
Yes of course but on Wiktionary there are two ways to do a definition: 1. Every word in the same language as the Wiktionary is defined fully as in a monolingual dictionary. 2. Every word from a foreign language is defined only by giving one or more translations or glosses into the Wiktionary language. Words which do not translate neatly into the Wiktionary language will be given a full definition (when somebody contrib- utes one) no matter what the language.
Imagine using the quick and easy way to move an article "topo" from Italian or "nezumi" from Japanese into a new Wiktionary language. Let's say the to a German Wiktionary.
In the translation section for both Italian and Japanese there will be a line with the translations into the English language. Both articles will give "mouse" and "rat".
So because we are using this new quick system, all the {{lang}} macros translated all the language names fine. Perhaps we don't speak English and everything looks fine to us because it's just what the other Wiktionary says.
So in the German Wiktionary we have just created an article named "Maus" with a translation section which says "English: mouse, rat".
The problem is that English "rat" is not a translation of German "Maus" just because they are both translations of the same Italian word "topo".
So there is not a quick way to import from the Italian Wiktionary to the German Wiktionary. The importer has to check each entry with an Italian<->German dictionary or somebody who knows both languages.
But if you had a look at the tables Polyglot uploaded to the meta site you would have seen that these tables include punctuation, synonyms, opposites, definitions, part of speech and much, much more (to me it was quite overwhelming as we translators think in glossaries and not all those particular definitions for a term).
Can you point to these tables please?
The discussion and papers are on Meta.
Umm in any particular place on Meta or is too obvious?
So now we have quite a good method to copy and paste quite easily translations into the different wiktionaries
Which makes me wonder why you just said "We are not talking about copy and paste; that would not be helpfull." Which is why I'm confused.
but does it make sense that a work needs to be repetead for every language again and again?
Because of examples like I've given about you can see that it is necessary.
Yes. Because every language is different. If it wasn't we'd just use Babelfish and the results would always be perfect.
When the word "applepie" gets its entry in nl:wiktionary, would it not be fair to say that the en:wiktionary content for that word is relevant? Would it not make sense to use all the content with some translations where needed?
Yes it's relevant. Yes it makes sense to use most of the content. But with the translations you must go through each of them very carefully.
<>Do we really have that much time to spend?
Whether or not we have the time to spend, a good quality dictionary requires time spent. If we choose not to spend that time we will not have a good quality dictionary. Ask any professional lexicographer. The OED, Websters, Le Grand Robert, you name it - all the big, quality dictionares took a long time. That is the nature of the game.
Yes, but contrary to Websters, OED, van Dale, we are not in the business of making money out of it. We are in the business of making open content and we can and should accept open content contributions from others. It is the aim of the wiktionary to have open content; and that is where we fail at this moment.
Absolutely! Not because we don't want to but because it's hard and we have limited support- especially of the technical kind.
Up to some weeks it was not even possible to think about a copy and paste method and thanks Gerard it is there - so its one huge step ahead, but it is still far away from being "time friendly". Users (contributing users) should talk about concentrating efforts in order to have better results in less time. Now we are using the wiktionary just like the huge dictionary editors used to do years ago when every single word with all the relevant information was written on single sheets.
You will find they still do it this way. They use computers now but the work is still painstaking and precise and slow. Dictionary building is not a race.
When the en:wiktionary content were open, I would be able to re-use that content in nl:wiktionary at this stage we cannot. Am I to believe all this work is
not
relevant to nl:
But it is open and you can and should use it. You just need to go through it very carefully, deciding what will cross the languages OK, change the language of headings, change the format (unfortunately).
The techinques are there to avoid double or in case of language translatios multiple work (how many wiktionaries are there? 20? 30? I did not check this out) so instead of one person needing one hour for a certain work this menas 30 hours of work to do the same job for 30 witkonaries.
I can guarantee if you avoid half the work you will double the errors.
What work are you talking about: does {{en}} not translate well between the wiktionaries ?
Yes it does. But that is the very easiest part! That only makes a small impact. It's the synonyms and related sections which will move easily between dictionaries. The translations require most of the work. The names of the languages in this section is close to none of the work.
<snip>
Andrew Dunbar.
===== http://linguaphile.sf.net/cgi-bin/translator.pl http://www.abisource.com
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com
Andrew Dunbar wrote:
Imagine using the quick and easy way to move an article "topo" from Italian or "nezumi" from Japanese into a new Wiktionary language. Let's say the to a German Wiktionary.
....
So in the German Wiktionary we have just created an article named "Maus" with a translation section which says "English: mouse, rat".
The problem is that English "rat" is not a translation of German "Maus" just because they are both translations of the same Italian word "topo".
"Topo" is a good example in defense of having the word explained for several languages on one page. For Italian the anumal translates as a mouse, but for the Spanish it is a mole.
Ec
Ray Saintonge schreef:
Andrew Dunbar wrote:
Imagine using the quick and easy way to move an article "topo" from Italian or "nezumi" from Japanese into a new Wiktionary language. Let's say the to a German Wiktionary.
....
So in the German Wiktionary we have just created an article named "Maus" with a translation section which says "English: mouse, rat".
The problem is that English "rat" is not a translation of German "Maus" just because they are both translations of the same Italian word "topo".
"Topo" is a good example in defense of having the word explained for several languages on one page. For Italian the anumal translates as a mouse, but for the Spanish it is a mole.
And I wouldn't be surprised if the French use it to abbreviate topographie.
Polyglot
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
If this distinction was not made initially, the split will have to be performed afterwards.
Polyglot
On Thu, 09 Sep 2004 11:20:19 +0200, cookfire cookfire@softhome.net wrote:
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
Just because a word refers to multiple objects doesn't mean it has multiple definitions.
To give an exaggerated example, it is like saying that the English word "bird" has over 9000 definitions because there are over 9000 kinds of birds and "bird" refers to them all.
Less extremely, the word "dolphin" in everyday English[1] refers indiscriminately to dolphins that live in the ocean, dolphins that live in rivers, and (for some people) porpoises; however, there are languages that distinguish them: in Latin for example they are "delphinus", "platanista", and "thursio" respectively[2], and I wouldn't be surprised if other languages did similarly.
English has lots of words like these, that get defined nonspecifically as "a member of taxonomic group X or Y".
*Muke! [1] Excepting in technical language, or language trying to be specific. In such cases even languages that don't ordinarily differentiate rat/mouse can find ways to be specific. [2] It's a really obscure example, I know, but it's the only one I'm familiar with offhand, from writing about dolphins in Latin. Pliny, who describes the animals, does not give them as kinds of dolphins, though he does say the _thursio_ "delphinorum similitudinem habent" (has the dolphins' likeness), and the _platanista_ is "rostro delphini et cauda" ([possessed] of a dolphin's muzzle and tail).
Muke Tever schreef:
On Thu, 09 Sep 2004 11:20:19 +0200, cookfire cookfire@softhome.net wrote:
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
Just because a word refers to multiple objects doesn't mean it has multiple definitions.
To give an exaggerated example, it is like saying that the English word "bird" has over 9000 definitions because there are over 9000 kinds of birds and "bird" refers to them all.
Less extremely, the word "dolphin" in everyday English[1] refers indiscriminately to dolphins that live in the ocean, dolphins that live in rivers, and (for some people) porpoises; however, there are languages that distinguish them: in Latin for example they are "delphinus", "platanista", and "thursio" respectively[2], and I wouldn't be surprised if other languages did similarly.
English has lots of words like these, that get defined nonspecifically as "a member of taxonomic group X or Y".
*Muke!
[1] Excepting in technical language, or language trying to be specific. In such cases even languages that don't ordinarily differentiate rat/mouse can find ways to be specific. [2] It's a really obscure example, I know, but it's the only one I'm familiar with offhand, from writing about dolphins in Latin. Pliny, who describes the animals, does not give them as kinds of dolphins, though he does say the _thursio_ "delphinorum similitudinem habent" (has the dolphins' likeness), and the _platanista_ is "rostro delphini et cauda" ([possessed] of a dolphin's muzzle and tail).
I follow you on the dolphin example. I don't follow you on the bird example. Bird is a word with a certain meaning, indeed it is a more general term, for which a lot of more specific words exist. This doesn't give a problem when translating.
As for the dolphins, you are right. That is a problem. One has to choose between adding comments with the translations, or whether one wants to distinguish between them on the "Meanings"/definitions level. I suppose it would depend on how many languages do make this distinction. If a lot of languages do, it may make sense to add more definitions.
Anyway, you are right, it is very important to be able to add comments to translations and it is not because word X in language A is a translation of word Y in language B, that this will also be the case for language Z.
The current situation is different though. If somebody defines the translation of a term between two languages in one Wiktionary, the other Wiktionaries are not updated. It is not because they are Wiktionaries in another language that the translation of these terms would be different. Even worse, it is very well possible for one Wiktionary to declare that the translation of A = B, while another language Wiktionary says that A = C. There is no consistency and no reasonable way to keep the Wiktionaries synchronized. This is the waste of resources that Gerard and Sabine are worried about.
You know how I have always been active to avoid duplication in the English Wiktionary and I always felt it was very unfortunate that duplication was going to happen on such a large scale because of the many Wiktionaries that were created. If it can't be helped, so be it. But what if we can come up with a way to not have this duplication? Where everybody can work with the Wiktionary in his/her own language, but where the results are pooled together.
When somebody claims that a translation between two languages is such, then everybody can check this out. When somebody writes down a nice definition or etymology, others can translate it. It is even possible to keep track of these translations and mark them 'fuzzy' when the original changes (but that doesn't have priority)
If people fill out a profile with languages they understand, they can get those definitions/etymologies in those languages they indicated. If anonymous users come by, they can see in which languages translations are already provided, making use of symbols, in case it is not available in their own language. They can then decide to translate it, if they want to make a contribution.
If we want to implement this, we are looking at a major software rewrite though and it is true that there are not a lot of developers to implement it. (Which is why I haven't spoken too much about this before). Before we start looking for developers, we should find out whether there is a willingness to host this on the WikiMedia servers. If the willingness is there it all has to be refined and analyzed. Then it can be implemented. This is not something that will be ready in a few months. After the system is set up, a migration can take place. Conflicts will have to be resolved where they occur. In the end we will have a better dictionary though, given that it is designed properly in the early stages. It will be possible to export it or selected parts of it, according to the needs of who is doing the extracting. And what comes out of it will be extremely structured and usable for other projects.
Polyglot
Muke Tever wrote:
On Thu, 09 Sep 2004 11:20:19 +0200, cookfire cookfire@softhome.net wrote:
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
Just because a word refers to multiple objects doesn't mean it has multiple definitions.
Less extremely, the word "dolphin" in everyday English[1] refers indiscriminately to dolphins that live in the ocean, dolphins that live in rivers, and (for some people) porpoises; however, there are languages that distinguish them: in Latin for example they are "delphinus", "platanista", and "thursio" respectively[2], and I wouldn't be surprised if other languages did similarly.
A dolphin can also be a fish, the dorado or mahi-mahi (Coryphaena hippurus)
Ec
--- cookfire cookfire@softhome.net wrote:
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
Rubbish! Spanish "ser" and "estar" both **translate to** English "be". That doesn't mean that English "be" has two definitions. Italian "topo" doesn't **mean** "mouse" or "rat" - those are foreign words. Italian "topo" means a group of rodents of a particular type of appearance, just as Italian "sorcio" does. Just as Japanese "nezumi" does.
I seriously cannot believe that you believe these languages must have 1:1 mappings. I especially can't believe that you want to force languages into trying to fit this expectation. You speak more than one language - think about this!
If this distinction was not made initially, the split will have to be performed afterwards.
Please read about translation!
Andrew Dunbar (hippietrail)
Polyglot _______________________________________________ Wiktionary-l mailing list Wiktionary-l@Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
===== http://linguaphile.sf.net/cgi-bin/translator.pl http://www.abisource.com
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com
Andrew Dunbar wrote:
--- cookfire cookfire@softhome.net wrote:
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
Rubbish! Spanish "ser" and "estar" both **translate to** English "be". That doesn't mean that English "be" has two definitions. Italian "topo" doesn't **mean** "mouse" or "rat" - those are foreign words. Italian "topo" means a group of rodents of a particular type of appearance, just as Italian "sorcio" does. Just as Japanese "nezumi" does.
I seriously cannot believe that you believe these languages must have 1:1 mappings. I especially can't believe that you want to force languages into trying to fit this expectation. You speak more than one language - think about this!
If this distinction was not made initially, the split will have to be performed afterwards.
Please read about translation!
Andrew Dunbar (hippietrail)
Polyglot _______________________________________________ Wiktionary-l mailing list Wiktionary-l@Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
===== http://linguaphile.sf.net/cgi-bin/translator.pl http://www.abisource.com
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com _______________________________________________ Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
Athough it is nice to talk about mice and rats, muizenissen is a Dutch word that comes to mind, it has nothing to do with the subject: "Opening up the Wiktionary content".
The issue is are we going to open up our wiktionary content. Is it important to have our content open so that it can be shared in our other wiktionaries and in other resources outside wikimedia. This means that we will be able to describe our content in something like XML. The issue is strategic, structure not content like "topo". Whatever will be the outcome of the "topo"question, the method of describing what comes out of it is at issue not the messagecontent itself.
Thanks, Gerard
Andrew Dunbar schreef:
--- cookfire cookfire@softhome.net wrote:
If topo in Italian means both mouse and rat. Then it should be a word with two definitions. One will translate into mouse, the other will translate into rat.
Rubbish! Spanish "ser" and "estar" both **translate to** English "be". That doesn't mean that English "be" has two definitions. Italian "topo" doesn't **mean** "mouse" or "rat" - those are foreign words. Italian "topo" means a group of rodents of a particular type of appearance, just as Italian "sorcio" does. Just as Japanese "nezumi" does.
I seriously cannot believe that you believe these languages must have 1:1 mappings. I especially can't believe that you want to force languages into trying to fit this expectation. You speak more than one language - think about this!
If this distinction was not made initially, the split will have to be performed afterwards.
Please read about translation!
Andrew Dunbar (hippietrail)
Hi Hippietrail,
You are right. In the case of ser/estar the distinction will have to be made as a comment near to the translations to Spanish (and probably Portuguese). It wouldn't make sense to try to divide them in two meanings. The model of the database or xml-schema that we are going to come up with needs to take these cases into account and it will have to provide in a way to add them. If you check the schema I made, you will see that there is room for comments in the Meanings and the Dictionary tables, so I did think of this. It all needs to be tested though. Maybe we should make an inventory of some of the entries where we expect problems and try it out. What I proposed is something to to be able to move faster. It doesn't have to be the final solution. The question is: do we want to make it easier to open up the Wiktionary content and is there a willingness (from developers and WikiMedia and from the contributors) to do it properly?
Polyglot
wiktionary-l@lists.wikimedia.org