On the Wiktionary http://wiktionary.org/ project I run the interwiki bot. The process is simple; when an article exists in another language spelled exactly the same, I create an "interwiki" link. This allows you to see the information on another language Wiktionary. This process is an automated process, it works on all Wiktionaries and it is an unattended process.
I have received a request from the Polish Wiktionary to stop adding interwiki links for the Russian and for the Vietnamese Wiktionary. The reason given is one of quality. On the Russian Wiktionary many of the articles are created by a bot and they do not provide good information. An example is dispersion, http://ru.wiktionary.org/wiki/dispersion there is nothing really in there. The Vietnamese Wiktionary is more problematic because a bot was used to generate declension and conjugation tables of Russian words and they got it wrong.
The Russian Wiktionary has some 81.000 empty shells and refuse to remove it. The Vietnamese are not willing to remove there incorrect data.
I have been asked to stop including the Russian Wiktionary and the Vietnamese Wiktionary when I run the interwiki process. To be honest, I run the bot as a service and I do not think it is the right thing to do. I think the Vietnamese are wrong not to correct the wrong data that they have. I am less sure about the Russian approach; in essence it is a stub. However, creating a Wiktionary in this way is like stamp collecting; you can look at it but there is not information about it.
Given how the process works, I am not sure that I can exclude either the Russian or the Vietnamese Wiktionary. The way it works is that I run explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will probably end up removing all references to these projects. They are the third and fourth Wiktionary is size.
When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may end up being blocked on the Polish Wiktionary. This will also kill off the interwiki process.
From my point of view, using bots to generate content in a Wiktionary only
makes sense when there is at least a link to the word in the base language. When the initial creation of stubs is followed by the enrichment of these stubs it is acceptable. For having information that is completely wrong, there is no excuse.
The question is, will there be a discussion about acceptable practices in Wiktionary. The question are:
- Can the Polish demand what they do? - Is having a project that consists mainly of stubs acceptable? - Is having incorrect data acceptable?
Thanks, GerardM
PS I copied this from my blog.
GerardM gerard.meijssen@gmail.com wrote:
- Can the Polish demand what they do?
Absolutely. You continue saying your bot is a "service" but a service works for the people who need it and does what they want; it doesn't (except perhaps incidentally) work for the person providing it, doing what he wants.
- Is having a project that consists mainly of stubs acceptable?
Stubs? Yes. When I worked with the English Wikipedia it was mainly stubs.
The Russian example, though, is more a project that has been pre-seeded with templates. There is nothing wrong with this in itself--though it does inflate the page count--and we have already gone over the usefulness of knowing a word exists in a language.
- Is having incorrect data acceptable?
Isn't it the point of wiki that one has incorrect and incomplete data, but that one is building a community who will take the effort to improve it? In such a case you would, rather than wanting to hide the links, make the information _more_ public so, say, Russian visitors curious to see how the Vietnamese handle their words can contribute to correcting the information. (After all--this problem, was brought to your attention by vi.wikt regulars, or those following interwiki links to it?)
*Muke!
If your bot were blocked on a given wiki all that would happen is that your bot could no longer edit their entries. Your bot could still get data from that wiki, and it could still write that data to all other wikis. Sounds like a painless control over the bot, and one that any wiki which doesn't want that interwiki data should use. How do you figure that either solution will actually affect the process as a whole, anyway? -Dave
On 4/21/07, Muke Tever muke@frath.net wrote:
GerardM gerard.meijssen@gmail.com wrote:
- Can the Polish demand what they do?
Absolutely. You continue saying your bot is a "service" but a service works for the people who need it and does what they want; it doesn't (except perhaps incidentally) work for the person providing it, doing what he wants.
- Is having a project that consists mainly of stubs acceptable?
Stubs? Yes. When I worked with the English Wikipedia it was mainly stubs.
The Russian example, though, is more a project that has been pre-seeded with templates. There is nothing wrong with this in itself--though it does inflate the page count--and we have already gone over the usefulness of knowing a word exists in a language.
- Is having incorrect data acceptable?
Isn't it the point of wiki that one has incorrect and incomplete data, but that one is building a community who will take the effort to improve it? In such a case you would, rather than wanting to hide the links, make the information _more_ public so, say, Russian visitors curious to see how the Vietnamese handle their words can contribute to correcting the information. (After all--this problem, was brought to your attention by vi.wikt regulars, or those following interwiki links to it?)
*Muke!
-- website: http://frath.net/
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Hoi, When my bot is blocked on one Wiktionary it does not work at all any more. The way it is configured is that it works on all projects at all times. This is completely different from how it works on the Wikipedia projects. It is also why one guy can run this as a service. Thanks, Gerard
the dave ross schreef:
If your bot were blocked on a given wiki all that would happen is that your bot could no longer edit their entries. Your bot could still get data from that wiki, and it could still write that data to all other wikis. Sounds like a painless control over the bot, and one that any wiki which doesn't want that interwiki data should use. How do you figure that either solution will actually affect the process as a whole, anyway? -Dave
On 4/21/07, Muke Tever muke@frath.net wrote:
GerardM gerard.meijssen@gmail.com wrote:
- Can the Polish demand what they do?
Absolutely. You continue saying your bot is a "service" but a service works for the people who need it and does what they want; it doesn't (except perhaps incidentally) work for the person providing it, doing what he wants.
- Is having a project that consists mainly of stubs acceptable?
Stubs? Yes. When I worked with the English Wikipedia it was mainly stubs.
The Russian example, though, is more a project that has been pre-seeded with templates. There is nothing wrong with this in itself--though it does inflate the page count--and we have already gone over the usefulness of knowing a word exists in a language.
- Is having incorrect data acceptable?
Isn't it the point of wiki that one has incorrect and incomplete data, but that one is building a community who will take the effort to improve it? In such a case you would, rather than wanting to hide the links, make the information _more_ public so, say, Russian visitors curious to see how the Vietnamese handle their words can contribute to correcting the information. (After all--this problem, was brought to your attention by vi.wikt regulars, or those following interwiki links to it?)
*Muke!
Hi, I would like to draw your attention to one issue. We should think about the aim of the project. If we allow to create enormous amount of stubs it has a lot of effects. I definitely agree that it helps to fill in quite difficult Russian template, and that they have less work to do checking articles crated by newcomers. However, it might influence the way the Russian Wiktionary is perceived. It seems obvious to me that all projects aim at being reliable sources. Users expect that a dictionary saying it has over 100 000 entries actually has them. If 8 out of 11 words they check at a time are in completely empty articles they probably will not come back to the dictionary as it is a waste of time for them. The same is with interwiki links. People often want to compare articles. Russians write them really fantastic, I like the kind of very specific and precise information they give, but empty templates are really discouraging. I would even dare to say that leading people to empty pages through interwiki is like not respecting them. I write this all because I believe that Wiktionaries are created to serve people and that they should be as ergonomic as possible.
While looking through posts, here and at Russian Wiktionary, connected with the topic I came across the idea that templates show how much there is to do and encourage people to fill them in. Well, if we knew how much not registered users search through Wiktionaries and add any sort of information with comparison to ones that do not we could say to what extend it is true, but as we do not know (or we do?) it is safer to assume that a user rather searches for information than is willing to share his knowledge.
Somebody also mentioned that templates may function as spell-checker, articles may just inform you that a word exists as such. But who really needs that considering the fact that everybody have Windows Word or OppenOffice Writer with pretty good spell-checker that moreover suggest correct spelling?
I liked the idea that bot would recognise a mark telling it is an empty template and then would not link to it. Is it possible to do?
I am so against leaving lacunas, because we have great opportunity to use Wikipedia's reputation of a very good source of information, and I am worried that by doing such things as Russians do we may spoil it.
I wonder what do you think. Helena Polyak / Rovdyr (PL)
Gerard Meijssen wrote:
Hoi, When my bot is blocked on one Wiktionary it does not work at all any more. The way it is configured is that it works on all projects at all times. This is completely different from how it works on the Wikipedia projects. It is also why one guy can run this as a service. Thanks, Gerard
the dave ross schreef:
If your bot were blocked on a given wiki all that would happen is that your bot could no longer edit their entries. Your bot could still get data from that wiki, and it could still write that data to all other wikis. Sounds like a painless control over the bot, and one that any wiki which doesn't want that interwiki data should use. How do you figure that either solution will actually affect the process as a whole, anyway?
Then reconfigure it so that its operation can be blocked on a project that doesn't want it. In the interest of autonomy of projects, a particular Wiktionary should be able to block it the way it blocks any other user.
Ec
Hoi, As there isno independence, this bot can function in the first place. I cannot reconfigure the bot either. Also when a wiktionary is excluded, it will be probably removed everywhere. This is imho NOT a good idea. The Vietnamese have a current problem that will get fixed. Having all their interwiki links removed is imho a BAD idea. Thanks, GerardM
On 4/24/07, Ray Saintonge saintonge@telus.net wrote:
Gerard Meijssen wrote:
Hoi, When my bot is blocked on one Wiktionary it does not work at all any more. The way it is configured is that it works on all projects at all times. This is completely different from how it works on the Wikipedia projects. It is also why one guy can run this as a service. Thanks, Gerard
the dave ross schreef:
If your bot were blocked on a given wiki all that would happen is that
your
bot could no longer edit their entries. Your bot could still get data
from
that wiki, and it could still write that data to all other
wikis. Sounds
like a painless control over the bot, and one that any wiki which
doesn't
want that interwiki data should use. How do you figure that either
solution
will actually affect the process as a whole, anyway?
Then reconfigure it so that its operation can be blocked on a project that doesn't want it. In the interest of autonomy of projects, a particular Wiktionary should be able to block it the way it blocks any other user.
Ec
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Thanks for bringing this up, Gerard. As I noted on your blog entry, we're aware of the problem and are working to correct it. Right now, we're about to blank the existing templates and import new ones directly from the Russian Wiktionary. [1] There aren't many of us working on the Vietnamese Wiktionary, as I've said before, and only one of us knows any Russian (and not that much). That's why it's taken so long for anyone to notice the mistakes. You're all welcome to join in on our discussion.
I would oppose delisting the Vietnamese Wiktionary on the grounds that our Vietnamese, English, and French entries -- which make up the vast majority of our site -- are rather good. In fact, the source that we used for all our imports is *the* Vietnamese translationary on the Web. It's just the conjugation tables that PiedBot created that are the problem. I think having your bot distinguish between the Russian and non-Russian entries would be more trouble than it's worth.
By the way, you might want to have a look at the Lombard Wikipedia sometime. They have thousands of articles in English that claim to be in a variety of Lombard. [2] By comparison, the Russian Wiktionary doesn't look that bad. :)
[1] http://vi.wiktionary.org/wiki/Thảo_luận_Thành_viên:David#Re:_.5B.5BTh.E1.BA.A3o_lu.E1.BA.ADn_Th.C3.A0nh_vi.C3.AAn:Mxn.23Russian_conjugations.7CRussian_conjugations.5D.5D [2] http://lmo.wikipedia.org/wiki/14th_Street_(IRT_Broadway-Seventh_Avenue_Line)
GerardM wrote:
On the Wiktionary http://wiktionary.org/ project I run the interwiki bot. The process is simple; when an article exists in another language spelled exactly the same, I create an "interwiki" link. This allows you to see the information on another language Wiktionary. This process is an automated process, it works on all Wiktionaries and it is an unattended process.
I have received a request from the Polish Wiktionary to stop adding interwiki links for the Russian and for the Vietnamese Wiktionary. The reason given is one of quality. On the Russian Wiktionary many of the articles are created by a bot and they do not provide good information. An example is dispersion, http://ru.wiktionary.org/wiki/dispersion there is nothing really in there. The Vietnamese Wiktionary is more problematic because a bot was used to generate declension and conjugation tables of Russian words and they got it wrong.
The Russian Wiktionary has some 81.000 empty shells and refuse to remove it. The Vietnamese are not willing to remove there incorrect data.
I have been asked to stop including the Russian Wiktionary and the Vietnamese Wiktionary when I run the interwiki process. To be honest, I run the bot as a service and I do not think it is the right thing to do. I think the Vietnamese are wrong not to correct the wrong data that they have. I am less sure about the Russian approach; in essence it is a stub. However, creating a Wiktionary in this way is like stamp collecting; you can look at it but there is not information about it.
Given how the process works, I am not sure that I can exclude either the Russian or the Vietnamese Wiktionary. The way it works is that I run explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will probably end up removing all references to these projects. They are the third and fourth Wiktionary is size.
When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may end up being blocked on the Polish Wiktionary. This will also kill off the interwiki process.
From my point of view, using bots to generate content in a Wiktionary only makes sense when there is at least a link to the word in the base language. When the initial creation of stubs is followed by the enrichment of these stubs it is acceptable. For having information that is completely wrong, there is no excuse.
The question is, will there be a discussion about acceptable practices in Wiktionary. The question are:
- Can the Polish demand what they do?
- Is having a project that consists mainly of stubs acceptable?
- Is having incorrect data acceptable?
Thanks, GerardM
PS I copied this from my blog.
GerardM wrote:
On the Russian Wiktionary many of the articles are created by a bot and they do not provide good information. An example is dispersion, http://ru.wiktionary.org/wiki/dispersion there is nothing really in there.
Would it be possible to have the Russian bot creating content-free articles include some kind of tag, to be removed by a human editor when adding content, that the interwiki bot could recognize? Ideally, we should not link to non-content, but that is preferable to not linking to ruwikt at all.
The Vietnamese Wiktionary is more problematic because a bot was used to generate declension and conjugation tables of Russian words and they got it wrong.
I agree with Muke here. Factual inaccuracies are undesirable, but all Wiktionaries have them to some extent, and it is not another Wiktionary's job to police them. It is inherent to the wiki process that there will always be room for improvement; excluding interwiki links for inaccuracies is unworkable. There are many good viwikt articles, and there will be more good viwikt articles in the future, regardless of their problems. At the same time, this is a plwikt local issue, and if they develop consensus on the matter, I would feel uncomfortable imposing any outsiders' rules on them.
Given how the process works, I am not sure that I can exclude either the Russian or the Vietnamese Wiktionary. The way it works is that I run explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will probably end up removing all references to these projects. They are the third and fourth Wiktionary is size.
When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may end up being blocked on the Polish Wiktionary. This will also kill off the interwiki process.
Does this mean that you couldn't just have the bot not add Russian and Vietnamese interwiki links to plwikt only? Even if we don't like the policy of excluding certain project's interwiki links, it is better than having no links.
Dominic
wiktionary-l@lists.wikimedia.org