- Implement spelling and punctuation check automatically within GTTK before
posting of the articles.
There is spell check in Translator Toolkit, although it's not available for all languages. We don't have any punctuation checks today and I doubt that we can release this anytime soon. (If it's not available in Google Docs or Gmail, then it's unlikely that we'll have it for Translator Toolkit, as well, since we use the same infrastructure.)
What's the proposal, though - would you like for us to prevent publishing of articles if they have too many spelling errors, or simply warn the user that there are X spelling errors? Any input you can provide on preferred behavior would be great.
I would say to force spellcheck before publication, which does not seem to be the case currently. I think this would be enough - perhaps a warning as well. I don't know about preventing publication, although that might work too.
- Have GTTK automatically remove broken templates and images, or require
users to translate any templates before a page may be posted.
Templates are a bit tricky. Sometimes, a template in one Wikipedia does not exist in another Wikipedia. Other times, a template in one langauge maps to a template in another language but the parameters are different.
Removing broken templates automatically may not work because some templates come between words. If we remove them, the sentences or paragraph may become invalid. We've also considered creating a custom interface for localizing templates, but this requires a lot of work.
In the interim, the approach we've taken is to have translators fix the templates in Wikipedia when they post the article from Translator Toolkit. When a user clicks on Share > Publish to source page in Translator Toolkit, the Wikipedia article is in preview mode --- it's not live. The idea is that if there are any errors, the translator can fix them before saving the article.
Well, many translators do fix such problems, but I was just thinking of some of the problems that I've heard so far with people who do "drive-by" translations, dropping it on a project and then disappearing. If translators are careful and do all the work themselves, templates are an annoyance rather than a real problem.
- Include a list of most needed articles for people to create, rather than
random articles that will be of little use to local readers. Some articles, such as those on local topics, have the added benefit of encouraging more edits and community participation since they tend to generate more interest from speakers of a language in my experience.
The articles we selected actually weren't really random. Here's how we selected them:
- we looked at the top Google searches in the region (e.g., for Tamil, we
looked at searches in India and I believe Sri Lanka, as well) 2. from the top Google searches in the region, we looked at the top, clicked Wikipedia articles --- regardless of the language (so we wound up with Wikipedia source articles in English, Hindi, and other languages) 3. from the top, clicked Wikipedia articles, we looked for articles that were either stubs or unavailable in the local language - these are the articles that we sent for translation
This selection isn't perfect. For example, it assumes that the top, clicked Wikipedia articles by all users in India/Sri Lanka --- who may be searching in English, Hindi, Tamil, or some other language --- are relevant to the Tamil community. To improve this, last month, we met with members of the Tamil and Telugu Wikipedias to improve this article selection. The main changes that we agreed on were:
I'm not sure if this project was separate from the Swahili Wikipedia Challenge, but I'm assuming it was after seeing articles such as http://sw.wikipedia.org/wiki/Maduka_ya_United_Cigar_Stores (about a defunct chain of cigar stores in the US) which I doubt were popular searches in East Africa.
One more idea: Automatically add existing Interwikis links to the new article.
Also, as far as Indic languages go, I would ask if there's any chance you have any Oriya speakers - with 637 articles, the Oriya Wikipedia is by far the most anemic of Indic-language Wikipedias, in spite of a speaker population of 31 million.
-m.