Hoi, As this is a RFC, I will comment to the RFC itself and not on the other comments.
Danny mentioned in his response that a bot could do great work. Henna did remark that Wikidata could make a difference. Milos mentions that data may need localisation.. I want to remind you about an e-mail that Sabine Cretella send to the lists. Sabine is really active in the Neapolitan Wikipedia. A project much younger than the Swahili wikipedia but already with 4336 articles. The secret of this success is among other things that Sabine uses professional tools to translate into the Neapolitan language. OmegaT, the software Sabine uses, is GPL software and is what is called a CAT or Computer Aided Translation tool. This allows for an efficient translation and is /not /the same as Automated Translation.
When we have Wikidata ready for prime time, we will be able to store structured data about one subject. This is not a full solution as many of the words used in the presentation need to be translated, maybe even localised to make sense in another language. I for instance always have to think if 9/11 is the ninth of November or the eleventh of September; I do know of it for the event. In order to present data, labels have to be translated and data may have to be localised. The WiktionaryZ project will help with the labels and standards like the CLDR are what define how the localisation is to be done.
We are making steady progress with WiktionaryZ, the first alpha demo project is at epov.org/wd-gemet/index.php/Main_Page (a read only project for now). There is a proposal for a project at http://meta.wikimedia.org/wiki/Wiki_for_standards that intends to help us where the standards prove to be not good enough. As Sabine is part of the team behind OmegaT, it is being researched how OmegaT can read and write directly to a Mediawiki project.
One other aspect that is needed in new project is commitment. People who express their support for a new language project should see this as an indication of /their /commitment and not as an expression of their opinion. When people start to work on a new project it is important that like on the Neapolitan wikipedia, there are people who are knowledgeable and willing to help the newbies, I hope that the IRC channel #wikipedia-bootcamp can serve a role for this as well.
Thanks, GerardM
Milos Rancic wrote:
Maybe this should go on Meta, but I want to see comments here, first.
As I can see, there are two ways of mass content adding. The first one includes generation of articles based on some public data (for example NASA, National Geospatial Inteligence Agency, French government etc.) Now, this is almost usual way for mass content adding and I think that a number of us have some experience with such work.
The other way is adding content using English Wikipedia. English Wikipedia has a lot of categorized articles, a lot of templates etc. All these typical forms can be used for automatic content creation on small Wikipedias.
I think that idea of having a thousends of articles with a couple of sentences and good categorization about a lot of fields -- can be very helpful not only to small Wikipedias, but also for spreading free knowledge. I think that it would be a great day for us when people which native language is Mongolian will be able to read about places in Amazon and movies from Australia in their native language. And, this is possible to do much faster then we think.
And not only that: bots should be able to update information; bots should be able to do more things through time. Finally, it would be possible to start with knowledge transfer between Wikipedias in different languages: if we have the same methodology on different Wikipedias, we would be able to update data semi-automatic (up to full automatic).
However, this needs a number of people who are interested in such project:
(1) We would need people who know to work with bots (pywikipediabot or something similar). (2) We would need make software based on the bot core which would have to be localized: like MediaWiki should be localized; this software should have sentences like "<movie> is movie made in <year> in <country>. Genre of that movie is <genre>. Director was <director>..." in a number of languages. (3) We would need good and quality work on English Wikipedia. Rules like "this goes to the table, that goes to the template up, this goes to template in the middle" should be more or less strict (but, I see that people are working in such way on en:).
This is RFC. I am looking for your comments.