Hoi,
As this is a RFC, I will comment to the RFC itself and not on the other
comments.
Danny mentioned in his response that a bot could do great work. Henna
did remark that Wikidata could make a difference. Milos mentions that
data may need localisation.. I want to remind you about an e-mail that
Sabine Cretella send to the lists. Sabine is really active in the
Neapolitan Wikipedia. A project much younger than the Swahili wikipedia
but already with 4336 articles. The secret of this success is among
other things that Sabine uses professional tools to translate into the
Neapolitan language. OmegaT, the software Sabine uses, is GPL software
and is what is called a CAT or Computer Aided Translation tool. This
allows for an efficient translation and is /not /the same as Automated
Translation.
When we have Wikidata ready for prime time, we will be able to store
structured data about one subject. This is not a full solution as many
of the words used in the presentation need to be translated, maybe even
localised to make sense in another language. I for instance always have
to think if 9/11 is the ninth of November or the eleventh of September;
I do know of it for the event. In order to present data, labels have to
be translated and data may have to be localised. The WiktionaryZ project
will help with the labels and standards like the CLDR are what define
how the localisation is to be done.
We are making steady progress with WiktionaryZ, the first alpha demo
project is at
epov.org/wd-gemet/index.php/Main_Page (a read only project
for now). There is a proposal for a project at
http://meta.wikimedia.org/wiki/Wiki_for_standards that intends to help
us where the standards prove to be not good enough. As Sabine is part of
the team behind OmegaT, it is being researched how OmegaT can read and
write directly to a Mediawiki project.
One other aspect that is needed in new project is commitment. People who
express their support for a new language project should see this as an
indication of /their /commitment and not as an expression of their
opinion. When people start to work on a new project it is important that
like on the Neapolitan wikipedia, there are people who are knowledgeable
and willing to help the newbies, I hope that the IRC channel
#wikipedia-bootcamp can serve a role for this as well.
Thanks,
GerardM
Milos Rancic wrote:
Maybe this should go on Meta, but I want to see
comments here, first.
As I can see, there are two ways of mass content adding. The first one
includes generation of articles based on some public data (for example
NASA, National Geospatial Inteligence Agency, French government etc.)
Now, this is almost usual way for mass content adding and I think that
a number of us have some experience with such work.
The other way is adding content using English Wikipedia. English
Wikipedia has a lot of categorized articles, a lot of templates etc.
All these typical forms can be used for automatic content creation on
small Wikipedias.
I think that idea of having a thousends of articles with a couple of
sentences and good categorization about a lot of fields -- can be very
helpful not only to small Wikipedias, but also for spreading free
knowledge. I think that it would be a great day for us when people
which native language is Mongolian will be able to read about places
in Amazon and movies from Australia in their native language. And,
this is possible to do much faster then we think.
And not only that: bots should be able to update information; bots
should be able to do more things through time. Finally, it would be
possible to start with knowledge transfer between Wikipedias in
different languages: if we have the same methodology on different
Wikipedias, we would be able to update data semi-automatic (up to full
automatic).
However, this needs a number of people who are interested in such project:
(1) We would need people who know to work with bots (pywikipediabot or
something similar).
(2) We would need make software based on the bot core which would have
to be localized: like MediaWiki should be localized; this software
should have sentences like "<movie> is movie made in <year> in
<country>. Genre of that movie is <genre>. Director was
<director>..."
in a number of languages.
(3) We would need good and quality work on English Wikipedia. Rules
like "this goes to the table, that goes to the template up, this goes
to template in the middle" should be more or less strict (but, I see
that people are working in such way on en:).
This is RFC. I am looking for your comments.