2) Implement spelling and punctuation check
automatically within GTTK before
posting of the articles.
There is spell check in Translator Toolkit, although it's not available for
all languages. We don't have any punctuation checks today and I doubt that
we can release this anytime soon. (If it's not available in Google Docs or
Gmail, then it's unlikely that we'll have it for Translator Toolkit, as
well, since we use the same infrastructure.)
What's the proposal, though - would you like for us to prevent publishing of
articles if they have too many spelling errors, or simply warn the user that
there are X spelling errors? Any input you can provide on preferred
behavior would be great.
I would say to force spellcheck before publication, which does not
seem to be the case currently. I think this would be enough - perhaps
a warning as well. I don't know about preventing publication, although
that might work too.
3) Have GTTK automatically remove broken templates and
images, or require
users to translate any templates before a page may be posted.
Templates are a bit tricky. Sometimes, a template in one Wikipedia does not
exist in another Wikipedia. Other times, a template in one langauge maps to
a template in another language but the parameters are different.
Removing broken templates automatically may not work because some templates
come between words. If we remove them, the sentences or paragraph may
become invalid. We've also considered creating a custom interface for
localizing templates, but this requires a lot of work.
In the interim, the approach we've taken is to have translators fix the
templates in Wikipedia when they post the article from Translator Toolkit.
When a user clicks on Share > Publish to source page in Translator Toolkit,
the Wikipedia article is in preview mode --- it's not live. The idea is
that if there are any errors, the translator can fix them before saving the
article.
Well, many translators do fix such problems, but I was just thinking
of some of the problems that I've heard so far with people who do
"drive-by" translations, dropping it on a project and then
disappearing. If translators are careful and do all the work
themselves, templates are an annoyance rather than a real problem.
4) Include a list of most needed articles for people
to create, rather than
random articles that will be of little use to local readers. Some articles,
such as those on local topics, have the added benefit of encouraging more
edits and community participation since they tend to generate more interest
from speakers of a language in my experience.
The articles we selected actually weren't really random. Here's how we
selected them:
1. we looked at the top Google searches in the region (e.g., for Tamil, we
looked at searches in India and I believe Sri Lanka, as well)
2. from the top Google searches in the region, we looked at the top, clicked
Wikipedia articles --- regardless of the language (so we wound up with
Wikipedia source articles in English, Hindi, and other languages)
3. from the top, clicked Wikipedia articles, we looked for articles that
were either stubs or unavailable in the local language - these are the
articles that we sent for translation
This selection isn't perfect. For example, it assumes that the top, clicked
Wikipedia articles by all users in India/Sri Lanka --- who may be searching
in English, Hindi, Tamil, or some other language --- are relevant to the
Tamil community. To improve this, last month, we met with members of the
Tamil and Telugu Wikipedias to improve this article selection. The main
changes that we agreed on were:
I'm not sure if this project was separate from the Swahili Wikipedia
Challenge, but I'm assuming it was after seeing articles such as
http://sw.wikipedia.org/wiki/Maduka_ya_United_Cigar_Stores (about a
defunct chain of cigar stores in the US) which I doubt were popular
searches in East Africa.
One more idea: Automatically add existing Interwikis links to the new article.
Also, as far as Indic languages go, I would ask if there's any chance
you have any Oriya speakers - with 637 articles, the Oriya Wikipedia
is by far the most anemic of Indic-language Wikipedias, in spite of a
speaker population of 31 million.
-m.