[Foundation-l] Push translation

Mark Williamson node.ue at gmail.com
Thu Aug 5 18:22:44 UTC 2010


> 2) Implement spelling and punctuation check automatically within GTTK before
> posting of the articles.
>
> There is spell check in Translator Toolkit, although it's not available for
> all languages.  We don't have any punctuation checks today and I doubt that
> we can release this anytime soon.  (If it's not available in Google Docs or
> Gmail, then it's unlikely that we'll have it for Translator Toolkit, as
> well, since we use the same infrastructure.)
>
> What's the proposal, though - would you like for us to prevent publishing of
> articles if they have too many spelling errors, or simply warn the user that
> there are X spelling errors?  Any input you can provide on preferred
> behavior would be great.

I would say to force spellcheck before publication, which does not
seem to be the case currently. I think this would be enough - perhaps
a warning as well. I don't know about preventing publication, although
that might work too.

> 3) Have GTTK automatically remove broken templates and images, or require
> users to translate any templates before a page may be posted.
>
> Templates are a bit tricky.  Sometimes, a template in one Wikipedia does not
> exist in another Wikipedia.  Other times, a template in one langauge maps to
> a template in another language but the parameters are different.
>
> Removing broken templates automatically may not work because some templates
> come between words.  If we remove them, the sentences or paragraph may
> become invalid.  We've also considered creating a custom interface for
> localizing templates, but this requires a lot of work.
>
> In the interim, the approach we've taken is to have translators fix the
> templates in Wikipedia when they post the article from Translator Toolkit.
>  When a user clicks on Share > Publish to source page in Translator Toolkit,
> the Wikipedia article is in preview mode --- it's not live.  The idea is
> that if there are any errors, the translator can fix them before saving the
> article.

Well, many translators do fix such problems, but I was just thinking
of some of the problems that I've heard so far with people who do
"drive-by" translations, dropping it on a project and then
disappearing. If translators are careful and do all the work
themselves, templates are an annoyance rather than a real problem.

> 4) Include a list of most needed articles for people to create, rather than
> random articles that will be of little use to local readers. Some articles,
> such as those on local topics, have the added benefit of encouraging more
> edits and community participation since they tend to generate more interest
> from speakers of a language in my experience.
>
> The articles we selected actually weren't really random.  Here's how we
> selected them:
>
> 1. we looked at the top Google searches in the region (e.g., for Tamil, we
> looked at searches in India and I believe Sri Lanka, as well)
> 2. from the top Google searches in the region, we looked at the top, clicked
> Wikipedia articles --- regardless of the language (so we wound up with
> Wikipedia source articles in English, Hindi, and other languages)
> 3. from the top, clicked Wikipedia articles, we looked for articles that
> were either stubs or unavailable in the local language - these are the
> articles that we sent for translation
>
> This selection isn't perfect.  For example, it assumes that the top, clicked
> Wikipedia articles by all users in India/Sri Lanka --- who may be searching
> in English, Hindi, Tamil, or some other language --- are relevant to the
> Tamil community.  To improve this, last month, we met with members of the
> Tamil and Telugu Wikipedias to improve this article selection.  The main
> changes that we agreed on were:

I'm not sure if this project was separate from the Swahili Wikipedia
Challenge, but I'm assuming it was after seeing articles such as
http://sw.wikipedia.org/wiki/Maduka_ya_United_Cigar_Stores (about a
defunct chain of cigar stores in the US) which I doubt were popular
searches in East Africa.

One more idea: Automatically add existing Interwikis links to the new article.

Also, as far as Indic languages go, I would ask if there's any chance
you have any Oriya speakers - with 637 articles, the Oriya Wikipedia
is by far the most anemic of Indic-language Wikipedias, in spite of a
speaker population of 31 million.

-m.



More information about the foundation-l mailing list