[Foundation-l] Push translation

Michael Galvez michaelcg at gmail.com
Fri Aug 6 17:53:20 UTC 2010


Hi Mark,

Responses inline.

Mike

On Thu, Aug 5, 2010 at 2:22 PM, Mark Williamson <node.ue at gmail.com> wrote:

> > 2) Implement spelling and punctuation check automatically within GTTK
> before
> > posting of the articles.
> >
> > There is spell check in Translator Toolkit, although it's not available
> for
> > all languages.  We don't have any punctuation checks today and I doubt
> that
> > we can release this anytime soon.  (If it's not available in Google Docs
> or
> > Gmail, then it's unlikely that we'll have it for Translator Toolkit, as
> > well, since we use the same infrastructure.)
> >
> > What's the proposal, though - would you like for us to prevent publishing
> of
> > articles if they have too many spelling errors, or simply warn the user
> that
> > there are X spelling errors?  Any input you can provide on preferred
> > behavior would be great.
>
> I would say to force spellcheck before publication, which does not
> seem to be the case currently. I think this would be enough - perhaps
> a warning as well. I don't know about preventing publication, although
> that might work too.
>

How about this: we pop up a window that says, "Your translation has
misspelled words: X.  Publish anyway?"

Does that work?


> > 3) Have GTTK automatically remove broken templates and images, or require
> > users to translate any templates before a page may be posted.
> >
> > Templates are a bit tricky.  Sometimes, a template in one Wikipedia does
> not
> > exist in another Wikipedia.  Other times, a template in one langauge maps
> to
> > a template in another language but the parameters are different.
> >
> > Removing broken templates automatically may not work because some
> templates
> > come between words.  If we remove them, the sentences or paragraph may
> > become invalid.  We've also considered creating a custom interface for
> > localizing templates, but this requires a lot of work.
> >
> > In the interim, the approach we've taken is to have translators fix the
> > templates in Wikipedia when they post the article from Translator
> Toolkit.
> >  When a user clicks on Share > Publish to source page in Translator
> Toolkit,
> > the Wikipedia article is in preview mode --- it's not live.  The idea is
> > that if there are any errors, the translator can fix them before saving
> the
> > article.
>
> Well, many translators do fix such problems, but I was just thinking
> of some of the problems that I've heard so far with people who do
> "drive-by" translations, dropping it on a project and then
> disappearing. If translators are careful and do all the work
> themselves, templates are an annoyance rather than a real problem.
>
> > 4) Include a list of most needed articles for people to create, rather
> than
> > random articles that will be of little use to local readers. Some
> articles,
> > such as those on local topics, have the added benefit of encouraging more
> > edits and community participation since they tend to generate more
> interest
> > from speakers of a language in my experience.
> >
> > The articles we selected actually weren't really random.  Here's how we
> > selected them:
> >
> > 1. we looked at the top Google searches in the region (e.g., for Tamil,
> we
> > looked at searches in India and I believe Sri Lanka, as well)
> > 2. from the top Google searches in the region, we looked at the top,
> clicked
> > Wikipedia articles --- regardless of the language (so we wound up with
> > Wikipedia source articles in English, Hindi, and other languages)
> > 3. from the top, clicked Wikipedia articles, we looked for articles that
> > were either stubs or unavailable in the local language - these are the
> > articles that we sent for translation
> >
> > This selection isn't perfect.  For example, it assumes that the top,
> clicked
> > Wikipedia articles by all users in India/Sri Lanka --- who may be
> searching
> > in English, Hindi, Tamil, or some other language --- are relevant to the
> > Tamil community.  To improve this, last month, we met with members of the
> > Tamil and Telugu Wikipedias to improve this article selection.  The main
> > changes that we agreed on were:
>
> I'm not sure if this project was separate from the Swahili Wikipedia
> Challenge, but I'm assuming it was after seeing articles such as
> http://sw.wikipedia.org/wiki/Maduka_ya_United_Cigar_Stores (about a
> defunct chain of cigar stores in the US) which I doubt were popular
> searches in East Africa.
>

It's the same set of projects although at times, there were some variations
in the approach.  For the Swahili project, for example, in addition to
translating content (selected from search data), the students also created
content from scratch.

Re: Cigar Stores, I'm actually not sure where this article comes from.
 You're right that it's not terribly popular --- it doesn't show up as one
of the top, clicked articles from search data.  It may have been added to
the list later by a volunteer.


>
> One more idea: Automatically add existing Interwikis links to the new
> article.
>

We already include existing interwiki links into the new article.  If you
find a bug in this, please let us know and we'll fix it.


>
> Also, as far as Indic languages go, I would ask if there's any chance
> you have any Oriya speakers - with 637 articles, the Oriya Wikipedia
> is by far the most anemic of Indic-language Wikipedias, in spite of a
> speaker population of 31 million.
>
>
Oriya is one of the languages we'd love to work on.  We don't have any
activity on this today but if you have some Wikipedians who'd like to help
us get this off the ground, we'd love to get their contact info and we can
follow up from there.



>  -m.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list