[Foundation-l] Frustration with the conversion engines issue

Tim Starling tstarling at wikimedia.org
Thu Apr 2 16:22:13 UTC 2009

Ziko van Dijk wrote:
> Dear Aryeh,
> Your idea of "converting on the fly" would not work in many cases. Take for
> example the ß in German WP. Swiss (registered) readers can decide via their
> Preferences to see only ss and never ß, because the Swiss do not use ß.
> That's ok. But vice versa, not every ss is to be converted to ß.
> The Germany-Germans write for example "Masse" (a mass, with a short "a") and
> "Maße" (measures, with a long "a"). The Swiss write "Masse" and "Masse" for
> both. Now, imagine that a Swiss editor writes "Masse", the conversion engine
> would not know whether this should be converted to "Maße" or not. Only a
> person who knows German is capable to decide.

There's no reason in principle why a computer can't be as good at
making that decision as a human. Such ambiguities are what makes the
field of computational linguistics interesting, they're not a reason
to be dismissive. We need to find out what is possible with
state-of-the-art research systems, and then negotiate, or develop
software, to bring that technology to Wikipedia.

-- Tim Starling

