Ray Saintonge wrote:
jellings wrote:
To clarify on my original post:
I have a text file of dictionary terms - word, definition plus another file of word, synonyms - since I already have it in this format it would be a lot easier to import it.
I CAN reformat the file, but to enter it manually would take too long (there are over 20,000 entries in the dictionary file alone, more in the synonyms).
I do not have a complete understanding of the database schema so the answer may be obvious to others; any insight would be helpful.
Now that you have clarified your intention it is evident that this is a Wiktionary matter.
This might be raised in [[Wiktionary:Beer parlour]], but I suspect that it would not receive a warm reception. While the words which are not yet in Wiktionary might be welcomed there would remain the question of how do you propose to reconcile your contributions with the words that we already have. To the extent that any upload may be possible it should also be done at a rate that is slow enough to allow others to ask questions about each of your contributions. Perhaps by first entering a small sample of your files you could develop the confidence of the Wiktionary community that your contributions are acceptable.
Ec (en:Wiktionary sysop)
Hoi, This is a great example where the en:wiktionary assumes that is only has been adressed when dictionary terms are mentioned. It also assumes that its practices are the same on all wiktionaries. It is only one wiktionary and they are not.
There are several wiktionaries that welcome quality content and load them in bulk with a bot onto their pages. We even apply for temporary bot status so that it can be uploaded with a personalised bot (good for the GFDL requirements). This is a great way of beefing up the content of a project. Certainly when as often happens, the content is either specialised or with translations to a specific language it can be an extremely worthwhile addition to the wiktionary.
The argument that the upload should be slow enough for the community to ask questions is based on what, fear ?? There is already plenty of content on en: that is questionable like words spelled wrong, this is spelled out, but hey, it is a dictionary and content SHOULD be correct. So when you argue let us know what the content is about, I could agree. When you say do it slowly as we want to look into every addition, I would say that this is not feasible using a bot.
Thanks, GerardM
Gerard Meijssen wrote:
Ray Saintonge wrote:
This might be raised in [[Wiktionary:Beer parlour]], but I suspect that it would not receive a warm reception. While the words which are not yet in Wiktionary might be welcomed there would remain the question of how do you propose to reconcile your contributions with the words that we already have. To the extent that any upload may be possible it should also be done at a rate that is slow enough to allow others to ask questions about each of your contributions. Perhaps by first entering a small sample of your files you could develop the confidence of the Wiktionary community that your contributions are acceptable.
Ec (en:Wiktionary sysop)
Hoi, This is a great example where the en:wiktionary assumes that is only has been adressed when dictionary terms are mentioned. It also assumes that its practices are the same on all wiktionaries. It is only one wiktionary and they are not.
What a liar!!!
There are several wiktionaries that welcome quality content and load them in bulk with a bot onto their pages. We even apply for temporary bot status so that it can be uploaded with a personalised bot (good for the GFDL requirements). This is a great way of beefing up the content of a project. Certainly when as often happens, the content is either specialised or with translations to a specific language it can be an extremely worthwhile addition to the wiktionary.
If some other language Wiktionary is more interested in quantity than quality, or wants to use a bot just to beef up its article numbers the members there are welcome to come to such a community decision.
The argument that the upload should be slow enough for the community to ask questions is based on what, fear ?? There is already plenty of content on en: that is questionable like words spelled wrong, this is spelled out, but hey, it is a dictionary and content SHOULD be correct. So when you argue let us know what the content is about, I could agree. When you say do it slowly as we want to look into every addition, I would say that this is not feasible using a bot.
If you believe that there are words spelled wrong on the en:wiktionary then fix them, rather than whining here that they should be correct. In the first place the members of a community need to know what a bot will do so that they can decide on whether it is acting appropriately. Suddenly adding 20,000 entries without any opportunity to adapt them to current formatting practices is just adding 20,000 tasks for other people to fix. I am not prepared to arrogate to myself the authority to decide such a thing, nor can I accept that such a thing will be decided on a mailing list to which only a fraction of the active community subscribes. The proposal needs to be put to a broader community of the en:Wiktionary before it can be implemented there.
Ec
Ray Saintonge wrote:
Gerard Meijssen wrote:
Ray Saintonge wrote:
This might be raised in [[Wiktionary:Beer parlour]], but I suspect that it would not receive a warm reception. While the words which are not yet in Wiktionary might be welcomed there would remain the question of how do you propose to reconcile your contributions with the words that we already have. To the extent that any upload may be possible it should also be done at a rate that is slow enough to allow others to ask questions about each of your contributions. Perhaps by first entering a small sample of your files you could develop the confidence of the Wiktionary community that your contributions are acceptable.
Ec (en:Wiktionary sysop)
Hoi, This is a great example where the en:wiktionary assumes that is only has been adressed when dictionary terms are mentioned. It also assumes that its practices are the same on all wiktionaries. It is only one wiktionary and they are not.
What a liar!!!
A liar is someone who does not speak the truth. I am sorry that it hurts you to hear something that has some truth.
There are several wiktionaries that welcome quality content and load them in bulk with a bot onto their pages. We even apply for temporary bot status so that it can be uploaded with a personalised bot (good for the GFDL requirements). This is a great way of beefing up the content of a project. Certainly when as often happens, the content is either specialised or with translations to a specific language it can be an extremely worthwhile addition to the wiktionary.
If some other language Wiktionary is more interested in quantity than quality, or wants to use a bot just to beef up its article numbers the members there are welcome to come to such a community decision.
As neither you nor I have seen the data mentioned we are both in no position to say something about the quality of the data. I do appreciate quality data and, when I get quality data, I am happy to upload it to the wiktionaries that appreciate it. We do use a bot for that and, why should this be an issue when we know its source and when the quality of the data is excellent ?
The argument that the upload should be slow enough for the community to ask questions is based on what, fear ?? There is already plenty of content on en: that is questionable like words spelled wrong, this is spelled out, but hey, it is a dictionary and content SHOULD be correct. So when you argue let us know what the content is about, I could agree. When you say do it slowly as we want to look into every addition, I would say that this is not feasible using a bot.
If you believe that there are words spelled wrong on the en:wiktionary then fix them, rather than whining here that they should be correct. In the first place the members of a community need to know what a bot will do so that they can decide on whether it is acting appropriately. Suddenly adding 20,000 entries without any opportunity to adapt them to current formatting practices is just adding 20,000 tasks for other people to fix. I am not prepared to arrogate to myself the authority to decide such a thing, nor can I accept that such a thing will be decided on a mailing list to which only a fraction of the active community subscribes. The proposal needs to be put to a broader community of the en:Wiktionary before it can be implemented there.
Ec
I do not have time to spend more time on the en:wiktionary. (I run a bot on the en:wiktionary as we speak, and I run it on request.) I do agree that a community may know in advance what is going to happen. But why the community should be against data with words, descriptions and translations that currently do not exist is beyond me. Your idea that a file cannot be created that does conform to a standard is also something that is based on .. what ??
When I get this file, I will look at it. If it is any good I will create a big file that may be used as input for the bot. I will leave the translation part as is customary to me and the other bits as is customary to you. I will write in the beer parlour that I have quality data that I can upload to the en:wiktionary and when I get reasonable questions I will give reasonable answers. And after a week, given a reasonable outcome, I may start uploading the data. There will be no need to adapt them, do not worry :)
Thanks, GerardM
Obviously my question has caused a serious issue. I am thankful to those who have offered help.
My goal was NOT to do a bulk loading into en.wiktionary.org or wikipedia.org via a bot or anything else. I wanted to do this on my own installation from a file of my own work. I tried looking around on different wiki groups for the answer but was unsuccessful.
Slightly disappointed, but still working on it,
John
On Thu, 17 Feb 2005 20:34:32 +0100, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Ray Saintonge wrote:
Gerard Meijssen wrote:
Ray Saintonge wrote:
This might be raised in [[Wiktionary:Beer parlour]], but I suspect that it would not receive a warm reception. While the words which are not yet in Wiktionary might be welcomed there would remain the question of how do you propose to reconcile your contributions with the words that we already have. To the extent that any upload may be possible it should also be done at a rate that is slow enough to allow others to ask questions about each of your contributions. Perhaps by first entering a small sample of your files you could develop the confidence of the Wiktionary community that your contributions are acceptable.
Ec (en:Wiktionary sysop)
Hoi, This is a great example where the en:wiktionary assumes that is only has been adressed when dictionary terms are mentioned. It also assumes that its practices are the same on all wiktionaries. It is only one wiktionary and they are not.
What a liar!!!
A liar is someone who does not speak the truth. I am sorry that it hurts you to hear something that has some truth.
There are several wiktionaries that welcome quality content and load them in bulk with a bot onto their pages. We even apply for temporary bot status so that it can be uploaded with a personalised bot (good for the GFDL requirements). This is a great way of beefing up the content of a project. Certainly when as often happens, the content is either specialised or with translations to a specific language it can be an extremely worthwhile addition to the wiktionary.
If some other language Wiktionary is more interested in quantity than quality, or wants to use a bot just to beef up its article numbers the members there are welcome to come to such a community decision.
As neither you nor I have seen the data mentioned we are both in no position to say something about the quality of the data. I do appreciate quality data and, when I get quality data, I am happy to upload it to the wiktionaries that appreciate it. We do use a bot for that and, why should this be an issue when we know its source and when the quality of the data is excellent ?
The argument that the upload should be slow enough for the community to ask questions is based on what, fear ?? There is already plenty of content on en: that is questionable like words spelled wrong, this is spelled out, but hey, it is a dictionary and content SHOULD be correct. So when you argue let us know what the content is about, I could agree. When you say do it slowly as we want to look into every addition, I would say that this is not feasible using a bot.
If you believe that there are words spelled wrong on the en:wiktionary then fix them, rather than whining here that they should be correct. In the first place the members of a community need to know what a bot will do so that they can decide on whether it is acting appropriately. Suddenly adding 20,000 entries without any opportunity to adapt them to current formatting practices is just adding 20,000 tasks for other people to fix. I am not prepared to arrogate to myself the authority to decide such a thing, nor can I accept that such a thing will be decided on a mailing list to which only a fraction of the active community subscribes. The proposal needs to be put to a broader community of the en:Wiktionary before it can be implemented there.
Ec
I do not have time to spend more time on the en:wiktionary. (I run a bot on the en:wiktionary as we speak, and I run it on request.) I do agree that a community may know in advance what is going to happen. But why the community should be against data with words, descriptions and translations that currently do not exist is beyond me. Your idea that a file cannot be created that does conform to a standard is also something that is based on .. what ??
When I get this file, I will look at it. If it is any good I will create a big file that may be used as input for the bot. I will leave the translation part as is customary to me and the other bits as is customary to you. I will write in the beer parlour that I have quality data that I can upload to the en:wiktionary and when I get reasonable questions I will give reasonable answers. And after a week, given a reasonable outcome, I may start uploading the data. There will be no need to adapt them, do not worry :)
Thanks, GerardM
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
jellings wrote:
Obviously my question has caused a serious issue. I am thankful to those who have offered help.
My goal was NOT to do a bulk loading into en.wiktionary.org or wikipedia.org via a bot or anything else. I wanted to do this on my own installation from a file of my own work. I tried looking around on different wiki groups for the answer but was unsuccessful.
Slightly disappointed, but still working on it,
John
Do not get discouraged. Your proposal is being discussed. This is the normal way of doing things on a community project. We all want the project to prosper and grow. If you choose to submit your contributions at a rate of say 5-10 per day, people will be able to look at them, say what they think about them, extend them, improve them, provide you with feedback etc. An issue you have to think through is how to deal with already existing entries? Do you merge them manually? Do you expect the community to merge them? We are relatively few. That's why we need to protect ourselves against bulk submissions. Ideally all the material gets checked by at least two other people. One thing we do have, is time. There is no hurry. We will get there. If it's not in 10 years, it will be in 20 or more before we have the basics covered. Complete it will never be, anyway. There's always going to be room for improvement. Welcome to the Wiktionary project(s)
Polyglot
jellings wrote:
Obviously my question has caused a serious issue. I am thankful to those who have offered help.
My goal was NOT to do a bulk loading into en.wiktionary.org or wikipedia.org via a bot or anything else. I wanted to do this on my own installation from a file of my own work. I tried looking around on different wiki groups for the answer but was unsuccessful.
You are certainly blameless in what you did. What you did was a bit like reaching to pull open a door only to find that someone was pushing it at high speed on the other side. Perhaps your intention was not clearly expressed at first, but that happens to all of us. You had no way of knowing that automated processes would be so controversial.
Still, if you find a way to contribute some of your material I'm sure it will be appreciated. Good luck on your own installation.
Ec
wiktionary-l@lists.wikimedia.org