One of our most interesting projects, Vietnamese Wikipedia has now passed 1 M articles and has a growth just now of almost 100k/month
They use a clever bot named Cheer!-bot to generate a lot of very good articles. In some ways it is stronger then Lsjbot (covering more then spececies) but I do prefer that Lsjbot marks the generated articles with a template indicating they are botgenerated
start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh
Cheer-bot! generated articles (just now working on species like Lsjbot) https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:%C4%90%C3%B3ng_g...
Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm notice active generating around one year from now
As I said a lot of times, I believe it is a weakness we are not making use of the many excellent inititves taking place on less well known verisons (like the lithuanian I mentioned some time ago). I am not even sure there are any from viwp acrtive on this list.
Also I recommend you to look through the content of viwp by using the use the Random article feature Bài vie^'t nga^~u nhiên https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn
Anders
That's a great news that the Vietnamese Wikipedia has crossed 1M articles. What are the significant reasons behind Vietnamese Wikipedia's such growth? Is it just the usage of such clever Bots (that you have mentioned) or contribution by the Vietnamese Wikipedians? And actually how does the Cheer!-bot generate articles? Does it translate articles from English (or other) Wikipedia? And apart from translating, can it set and maintain correctly other aspects of Wikisyntax and coding?
Tanweer Morshed Board member Wikimedia Bangladesh
On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten < mail@anderswennersten.se> wrote:
One of our most interesting projects, Vietnamese Wikipedia has now passed 1 M articles and has a growth just now of almost 100k/month
They use a clever bot named Cheer!-bot to generate a lot of very good articles. In some ways it is stronger then Lsjbot (covering more then spececies) but I do prefer that Lsjbot marks the generated articles with a template indicating they are botgenerated
start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh
Cheer-bot! generated articles (just now working on species like Lsjbot) https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t: %C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot
Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm notice active generating around one year from now
As I said a lot of times, I believe it is a weakness we are not making use of the many excellent inititves taking place on less well known verisons (like the lithuanian I mentioned some time ago). I am not even sure there are any from viwp acrtive on this list.
Also I recommend you to look through the content of viwp by using the use the Random article feature Bài vie^'t nga^~u nhiên < https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1% BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn>
Anders
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
The Vietnamese Wikipedia creates 30-40 articles/day manually (with as many active contributers), which would put them on par with several medium/smaller projects, usually having around 200-400 K articles
But among their limited number of active contributers they have a set of very clever people creating and running bots. When I analyze the bot results I find they are using first class input (The Plant List and list of cities from like Turkey governement). I also see they are understanding the intricacies on how to manage the generated articles, them using (in some cases) templates and the discussion page. Also of course the code in the bots must be very good, managing these diverse set of articles. And no translating.
For me it confirms my belief that it is possible for most versions to set up and run clever bots for massgeneration of articles with 100% quality and using the best sources
I hope this experience can serve as an inspiration to others medium/smaller projects
My learnings from looking into several botgeneration effort is that there are three aspects you need to master 1.The infrastructure of generated articles - special templates, Categories, discussion pages, the speed they are set up (in order for reviews) how to handle already existing articles. I know this is learned on svwp itwp and nlwp. Basically it is common sense, but for any new the knowledge exist to learn from 2.The code of the Bots. it is complicated but no "rocket science". The three I have looked into in detail where all written by persons not being professional programmers and using different programming languages (c, AWB and something resembling Basic). You could probably get access to some existing botcode, but in general I would expect most communities to be able to find someone who can create these type of botsoftware 3.The inputs to generate data from. This I have found to be the most challenging aspect, both what lists to use, how to translate these into article texts and how to handle ambiguities/errors. And here I do would recommend to take in experience from people already done this. Official lists of geographic entities exist and are used by several projects (it, nl, vi etc) but why not using the same sets and why not involving wikidata in these? For species the already exist several good inputs Lsjbot use Catalogue of Life, but others (nl, vi) use others.
And while I have no direct contact with the people at viwp, I would welcome if any one made contact and made their bot generation knowledge available to others.
Anders
Tanweer Morshed skrev 2014-06-24 13:52:
That's a great news that the Vietnamese Wikipedia has crossed 1M articles. What are the significant reasons behind Vietnamese Wikipedia's such growth? Is it just the usage of such clever Bots (that you have mentioned) or contribution by the Vietnamese Wikipedians? And actually how does the Cheer!-bot generate articles? Does it translate articles from English (or other) Wikipedia? And apart from translating, can it set and maintain correctly other aspects of Wikisyntax and coding?
Tanweer Morshed Board member Wikimedia Bangladesh
On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten < mail@anderswennersten.se> wrote:
One of our most interesting projects, Vietnamese Wikipedia has now passed 1 M articles and has a growth just now of almost 100k/month
They use a clever bot named Cheer!-bot to generate a lot of very good articles. In some ways it is stronger then Lsjbot (covering more then spececies) but I do prefer that Lsjbot marks the generated articles with a template indicating they are botgenerated
start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh
Cheer-bot! generated articles (just now working on species like Lsjbot) https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t: %C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot
Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm notice active generating around one year from now
As I said a lot of times, I believe it is a weakness we are not making use of the many excellent inititves taking place on less well known verisons (like the lithuanian I mentioned some time ago). I am not even sure there are any from viwp acrtive on this list.
Also I recommend you to look through the content of viwp by using the use the Random article feature Bài vie^'t nga^~u nhiên < https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1% BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn>
Anders
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On 2014-06-24 04:52, Tanweer Morshed wrote:
That's a great news that the Vietnamese Wikipedia has crossed 1M articles. What are the significant reasons behind Vietnamese Wikipedia's such growth? Is it just the usage of such clever Bots (that you have mentioned) or contribution by the Vietnamese Wikipedians? And actually how does the Cheer!-bot generate articles? Does it translate articles from English (or other) Wikipedia? And apart from translating, can it set and maintain correctly other aspects of Wikisyntax and coding?
A great many of the Vietnamese Wikipedia's recent articles have been created automatically using bots, manually with word processors and mail merge, or semi-automatically with machine translators like (presumably) Google Translator Toolkit. Nonetheless, Cheers!-bot held a moratorium on new articles around the million-article mark, so that day was all about writing articles the old fashioned way.
Predictably, our bot articles are more infobox than prose. On the other hand, they do have correct grammar and wiki syntax, which cannot be said for most machine-translated articles, comprehensive as they may be. Cheers! is one of our most experienced editors and has done an admirable job correcting errors, whereas some machine translator users have uploaded incomprehensible articles anonymously, giving us no opportunity to engage and educate.
I can't say for certain how Cheers!-bot generates species stubs, but its earlier U.S. geographic stubs were "translated" from the Spanish Wikipedia's own bot-created stubs. I'm in the process of cleaning them up, translating the occasional Spanish place name to Vietnamese. We're also integrating our [[vi:Template:Infobox settlement]] with Wikidata, to provide more current information with minimal maintenance. For example, see the infobox at [[vi:Loveland, Ohio]], which passes only three parameters but provides 18 rows of information.
The surge in bot-created stubs has alarmed some members of the Vietnamese Wikipedia community. One frequent theme in our village pump is that our "depth" at [[m:List of Wikipedias]] has fallen from over a hundred (one of the highest) to just 15 (one of the lowest) in a few years. Even taking the depth metric with a grain of salt, I think this observation has led us to a newfound appreciation for edits, non-articles, and maybe even authentic, hand-made articles.
More importantly, the million-article milestone has shed a light on our seemingly low number of active editors. Some have expressed concern that the steadily rising article count has disincentivized readers from creating own articles on their own. So we're discussing some changes to our main page and messages to better engage potential contributors. We've also integrated tightly with VisualEditor -- the sandbox, "no such article" message, and "no search results" message all send users to VisualEditor by default -- hopefully lowering barriers to entry.
None of the Vietnamese Wikipedia's bot operators are interested in inflating our article count for the sake of. We care deeply about the future of our wiki and the health of its community, and we welcome feedback from the community at large.
wikimedia-l@lists.wikimedia.org