User:Lsj has written 4000 lines of C# source code on top of the DotNetWikiBot framework, to create 10,000 articles in Swedish about bird species in the spring of 2012 and recently even more articles in Swedish about fungi species.
Some information about his Lsjbot is found here, http://sv.wikipedia.org/wiki/Wikipedia:Projekt_DotNetWikiBot_Framework/Lsjbo...
The otherwise very reluctant/skeptic/picky Swedish Wikipedia community has gladly accepted these well-written articles.
I think it would be interesting if a community of wikipedians in some other language would try to translate this bot. Some languages might have notability or relevance requirements that these species don't fulfill, others might think 1700 bytes is a too short article. But I think the citation of sources and correctness of fact would be generally accepted.
Here is a blog post in Swedish about the bird articles, http://wikimediasverige.wordpress.com/2012/03/06/10-000-fagelarter-pa-svensk...
Some 3,600 birds are found in this category for articles that were bot-created and have not yet been inspected, http://sv.wikipedia.org/wiki/Kategori:Robotskapade_f%C3%A5gelartiklar Some 54,000 fungi species are found here, http://sv.wikipedia.org/wiki/Kategori:Robotskapade_svampartiklar The birds more often have common names, which are preferred as article names instead of the Latin/scientific names, e.g. the blue-and-white swallow, http://sv.wikipedia.org/wiki/Bl%C3%A5vit_svala where the Latin name is a bot-created redirect to the bot-created article, http://sv.wikipedia.org/wiki/Pygochelidon_cyanoleuca
At the Swedish Wikipedia village pump there is now a discussion of whether to continue with species of animals, plants, bacteria, etc. http://sv.wikipedia.org/wiki/Wikipedia:Bybrunnen#Botskapande_av_artiklar_f.C...
On 18/10/12 03:26, Lars Aronsson wrote:
User:Lsj has written 4000 lines of C# source code on top of the DotNetWikiBot framework, to create 10,000 articles in Swedish about bird species in the spring of 2012 and recently even more articles in Swedish about fungi species.
Some information about his Lsjbot is found here, http://sv.wikipedia.org/wiki/Wikipedia:Projekt_DotNetWikiBot_Framework/Lsjbo...
The otherwise very reluctant/skeptic/picky Swedish Wikipedia community has gladly accepted these well-written articles.
I think it would be interesting if a community of wikipedians in some other language would try to translate this bot. Some languages might have notability or relevance requirements that these species don't fulfill, others might think 1700 bytes is a too short article. But I think the citation of sources and correctness of fact would be generally accepted.
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski smolensk@eunet.rs wrote:
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
Why is that the case?
I didn't understand the scope of Wikidata to include actual creation of articles that don't exist. Only to provide data about topics across projects. Sure, that might be extremely helpful to someone with a bot to populate species articles, but I'm skeptical that Wikidata would or should be creating millions of articles about such things. If you consider something even slightly more controversial than species, such as schools, many projects would not welcome a third party mass-creating pages about a topic that is described in Wikidata.
Steven
On 18/10/12 09:25, Steven Walling wrote:
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenskismolensk@eunet.rs wrote:
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
Why is that the case?
I didn't understand the scope of Wikidata to include actual creation of articles that don't exist. Only to provide data about topics across projects. Sure, that might be extremely helpful to someone with a bot to populate species articles, but I'm skeptical that Wikidata would or should be creating millions of articles about such things. If you consider something even slightly more controversial than species, such as schools, many projects would not welcome a third party mass-creating pages about a topic that is described in Wikidata.
Wikidata won't need to create articles. Rather, if you are trying to see a page without an article, Wikipedia will check if an item with appropriate name exists in Wikidata and generate the article on the fly if Wikipedia has a local article template for this type of article.
On Thu, Oct 18, 2012 at 10:08 AM, Nikola Smolenski smolensk@eunet.rs wrote:
On 18/10/12 09:25, Steven Walling wrote:
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenskismolensk@eunet.rs wrote:
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
Why is that the case?
The necessary data to create those articles will be available in Wikidata, and possibly a lot more than we currently have in our templates. That could make it possible to create really awsome articles, if it were not for one thing - it is extremly hard to create well-formed text automatically. One of the more common problems are names that uses different inflection rules due to context and how they are written. Such inflection rules are not part of the Wikidata project and will probably be a major undertaking in itself.
Note that some languages does not need such inflection rules and then it is fairly simple to create articles from templates. In other cases it might be good enough to simply say "Pygochelidon cyanoleuca is a bird" and add an automatic template.
On 18/10/12 11:06, John Erling Blad wrote:
well-formed text automatically. One of the more common problems are names that uses different inflection rules due to context and how they are written. Such inflection rules are not part of the Wikidata project and will probably be a major undertaking in itself.
Why do you think that inflection rules will not be a part of Wikidata? They would be hugely needed on Wiktionary and there is no reason for Wikidata not being able to contain them.
Getting working inflection rules for even a single language is a major task, and doing so for several hundred languages would be a overwhelming task. I can't see how this can be implemented as part of the Wikidata project within a reasonable time frame.
There is a few shortcuts that can be made, and it is possible to make some generalized tools. For an open source alternative take a look at Apertium (http://en.wikipedia.org/wiki/Apertium). Usually it is only the generation/disambiguation phase that is necessary, and this makes the task somewhat simpler, but it is still a major undertaking.
Note that some of the basic tools already exist, we only need to interface them to Mediawiki, but the tools needs definition files to work (that is inflection rules for Northern Sami language for example, or Norwegian bokmål and nynorsk, or Swedish) and it is those definitions that is the major task.
John
On Thu, Oct 18, 2012 at 11:14 AM, Nikola Smolenski smolensk@eunet.rs wrote:
On 18/10/12 11:06, John Erling Blad wrote:
well-formed text automatically. One of the more common problems are names that uses different inflection rules due to context and how they are written. Such inflection rules are not part of the Wikidata project and will probably be a major undertaking in itself.
Why do you think that inflection rules will not be a part of Wikidata? They would be hugely needed on Wiktionary and there is no reason for Wikidata not being able to contain them.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
For now, we have no plans for Wikidata to create articles. This would, in my opinion, meddle too much with the autonomy of the Wikipedia language projects.
What will be possible is to facilitate the creation of such bots, as some data that might be used for the article might be taken from and maintained in Wikidata, and the creation of templates that use data from Wikidata.
Wikidata currently has no plans for creating text using natural language generation techniques. We would love for someone else to do this kind of awesome on top of Wikidata.
I hope this helps, Denny
2012/10/18 Nikola Smolenski smolensk@eunet.rs:
On 18/10/12 09:25, Steven Walling wrote:
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenskismolensk@eunet.rs wrote:
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
Why is that the case?
I didn't understand the scope of Wikidata to include actual creation of articles that don't exist. Only to provide data about topics across projects. Sure, that might be extremely helpful to someone with a bot to populate species articles, but I'm skeptical that Wikidata would or should be creating millions of articles about such things. If you consider something even slightly more controversial than species, such as schools, many projects would not welcome a third party mass-creating pages about a topic that is described in Wikidata.
Wikidata won't need to create articles. Rather, if you are trying to see a page without an article, Wikipedia will check if an item with appropriate name exists in Wikidata and generate the article on the fly if Wikipedia has a local article template for this type of article.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
For those interested this type of text synthesis, it can be done by using finite-state automata and transducers (FST's). The simplest way to make them is by cross-compiling into Lua from some other known form. John
On Thu, Oct 18, 2012 at 2:10 PM, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
For now, we have no plans for Wikidata to create articles. This would, in my opinion, meddle too much with the autonomy of the Wikipedia language projects.
What will be possible is to facilitate the creation of such bots, as some data that might be used for the article might be taken from and maintained in Wikidata, and the creation of templates that use data from Wikidata.
Wikidata currently has no plans for creating text using natural language generation techniques. We would love for someone else to do this kind of awesome on top of Wikidata.
I hope this helps, Denny
2012/10/18 Nikola Smolenski smolensk@eunet.rs:
On 18/10/12 09:25, Steven Walling wrote:
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenskismolensk@eunet.rs wrote:
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
Why is that the case?
I didn't understand the scope of Wikidata to include actual creation of articles that don't exist. Only to provide data about topics across projects. Sure, that might be extremely helpful to someone with a bot to populate species articles, but I'm skeptical that Wikidata would or should be creating millions of articles about such things. If you consider something even slightly more controversial than species, such as schools, many projects would not welcome a third party mass-creating pages about a topic that is described in Wikidata.
Wikidata won't need to create articles. Rather, if you are trying to see a page without an article, Wikipedia will check if an item with appropriate name exists in Wikidata and generate the article on the fly if Wikipedia has a local article template for this type of article.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 18/10/12 14:10, Denny Vrandečić wrote:
For now, we have no plans for Wikidata to create articles. This would, in my opinion, meddle too much with the autonomy of the Wikipedia language projects.
I don't know if I am so bad at explaining things or if this is such a complex thing to grasp.
No one has ever suggested for Wikidata to create articles. The only thing suggested was for Wikipedias to display an article-like template filled with Wikidata data if they have no article on a certain topic, instead of "Wikipedia does not have an article with this exact name" page they display now.
This would not only not meddle with Wikipedias' autonomies, it would require active engagement on part of the community in order to create the templates. If a community doesn't want the articles, they simply won't create the templates. Yet I believe this will reduce community tension since most communities don't like bot-created articles so this seems to be a reasonable compromise.
What will be possible is to facilitate the creation of such bots, as some data that might be used for the article might be taken from and maintained in Wikidata, and the creation of templates that use data from Wikidata.
Wikidata currently has no plans for creating text using natural language generation techniques. We would love for someone else to do this kind of awesome on top of Wikidata.
Natural language generation is not necessary for any of this.
2012/10/18 Nikola Smolenskismolensk@eunet.rs:
On 18/10/12 09:25, Steven Walling wrote:
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenskismolensk@eunet.rs wrote:
The need for such bots should cease after Wikidata is fully deployed. I suggest to interested programmers that they should direct their effort there.
Why is that the case?
I didn't understand the scope of Wikidata to include actual creation of articles that don't exist. Only to provide data about topics across projects. Sure, that might be extremely helpful to someone with a bot to populate species articles, but I'm skeptical that Wikidata would or should be creating millions of articles about such things. If you consider something even slightly more controversial than species, such as schools, many projects would not welcome a third party mass-creating pages about a topic that is described in Wikidata.
Wikidata won't need to create articles. Rather, if you are trying to see a page without an article, Wikipedia will check if an item with appropriate name exists in Wikidata and generate the article on the fly if Wikipedia has a local article template for this type of article.
On Mon, Oct 22, 2012 at 2:17 PM, Nikola Smolenski smolensk@eunet.rs wrote:
No one has ever suggested for Wikidata to create articles.
OK; then I misunderstood "and generate the article on the fly"..
Regards, Ole
No one has ever suggested for Wikidata to create articles.
OK; then I misunderstood "and generate the article on the fly"..
So to make sure that I understand this correctly, this is the idea: * Let's say I search on the lojban Wikipedia for Creagerstown, Maryland * The article doesn't exist, but Wikidata has information on it. * I'm told the article doesn't exist, but presented with a template showing when the town was founded etc. along with search results
Assuming the above is correct: Would they be able to make use of that template if they wanted to quickly throw together a stub on the topic?
Thank you, Derric Atzrott
On 22/10/12 14:31, Derric Atzrott wrote:
No one has ever suggested for Wikidata to create articles.
OK; then I misunderstood "and generate the article on the fly"..
So to make sure that I understand this correctly, this is the idea:
- Let's say I search on the lojban Wikipedia for Creagerstown, Maryland
- The article doesn't exist, but Wikidata has information on it.
- I'm told the article doesn't exist, but presented with a template showing when
the town was founded etc. along with search results
Yes, exactly. Though depending on what the community wants you don't even have to be told that the article doesn't exist. And I assume the template would be human-readable, like aforementioned http://sv.wikipedia.org/wiki/Bl%C3%A5vit_svala
Would they be able to make use of that template if they wanted to quickly throw together a stub on the topic?
Yes, they could use http://www.mediawiki.org/wiki/Manual:Creating_pages_with_preloaded_text
wikitech-l@lists.wikimedia.org