The Vietnamese Wikipedia creates 30-40 articles/day manually (with as many active contributers), which would put them on par with several medium/smaller projects, usually having around 200-400 K articles
But among their limited number of active contributers they have a set of very clever people creating and running bots. When I analyze the bot results I find they are using first class input (The Plant List and list of cities from like Turkey governement). I also see they are understanding the intricacies on how to manage the generated articles, them using (in some cases) templates and the discussion page. Also of course the code in the bots must be very good, managing these diverse set of articles. And no translating.
For me it confirms my belief that it is possible for most versions to set up and run clever bots for massgeneration of articles with 100% quality and using the best sources
I hope this experience can serve as an inspiration to others medium/smaller projects
My learnings from looking into several botgeneration effort is that there are three aspects you need to master 1.The infrastructure of generated articles - special templates, Categories, discussion pages, the speed they are set up (in order for reviews) how to handle already existing articles. I know this is learned on svwp itwp and nlwp. Basically it is common sense, but for any new the knowledge exist to learn from 2.The code of the Bots. it is complicated but no "rocket science". The three I have looked into in detail where all written by persons not being professional programmers and using different programming languages (c, AWB and something resembling Basic). You could probably get access to some existing botcode, but in general I would expect most communities to be able to find someone who can create these type of botsoftware 3.The inputs to generate data from. This I have found to be the most challenging aspect, both what lists to use, how to translate these into article texts and how to handle ambiguities/errors. And here I do would recommend to take in experience from people already done this. Official lists of geographic entities exist and are used by several projects (it, nl, vi etc) but why not using the same sets and why not involving wikidata in these? For species the already exist several good inputs Lsjbot use Catalogue of Life, but others (nl, vi) use others.
And while I have no direct contact with the people at viwp, I would welcome if any one made contact and made their bot generation knowledge available to others.
Anders
Tanweer Morshed skrev 2014-06-24 13:52:
That's a great news that the Vietnamese Wikipedia has crossed 1M articles. What are the significant reasons behind Vietnamese Wikipedia's such growth? Is it just the usage of such clever Bots (that you have mentioned) or contribution by the Vietnamese Wikipedians? And actually how does the Cheer!-bot generate articles? Does it translate articles from English (or other) Wikipedia? And apart from translating, can it set and maintain correctly other aspects of Wikisyntax and coding?
Tanweer Morshed Board member Wikimedia Bangladesh
On Tue, Jun 24, 2014 at 11:38 AM, Anders Wennersten < mail@anderswennersten.se> wrote:
One of our most interesting projects, Vietnamese Wikipedia has now passed 1 M articles and has a growth just now of almost 100k/month
They use a clever bot named Cheer!-bot to generate a lot of very good articles. In some ways it is stronger then Lsjbot (covering more then spececies) but I do prefer that Lsjbot marks the generated articles with a template indicating they are botgenerated
start page: https://vi.wikipedia.org/wiki/Trang_Ch%C3%ADnh
Cheer-bot! generated articles (just now working on species like Lsjbot) https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t: %C4%90%C3%B3ng_g%C3%B3p/Cheers!-bot
Statistics up to April http://stats.wikimedia.org/EN/TablesWikipediaVI.htm notice active generating around one year from now
As I said a lot of times, I believe it is a weakness we are not making use of the many excellent inititves taking place on less well known verisons (like the lithuanian I mentioned some time ago). I am not even sure there are any from viwp acrtive on this list.
Also I recommend you to look through the content of viwp by using the use the Random article feature Bài vie^'t nga^~u nhiên < https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1% BB%87t:Ng%E1%BA%ABu_nhi%C3%AAn>
Anders
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe