[Foundation-l] [Wikitech-l] Primary account for single user login

Mark Williamson node.ue at gmail.com
Wed Apr 9 14:55:42 UTC 2008


Of course, a user on ja.wp is unlikely to expect their name to be
transliterated using pinyin.

However, a user from Hong Kong may be surprised to find their name
rendered in a completely alien way (most people from Hong Kong are not
fluent in spoken Mandarin, although many businesspeople are taking
classes, although as has been discussed on this list before). This
problem can probably be overlooked as most names would still be
recognizable, if a bit foreign seeming (most Cantonese readings of
characters are at least similar enough in romanization to Mandarin for
a string to be identified as one's own username)

Also, computer transliteration for Japanese is a complex and many
faceted problem. Readings of characters in Japanese depend very
heavily on context. Although not insurmountable, that is a huge
obstacle requiring many transliteration rules (and you can forget
about round trip, that often requires a larger context including
perhaps a whole sentence due to synonyms). Characters may have many
readings depending whether they occur alone, whether they occur in a
compound and even what character they are compounded with. Many more
readings are added by unusual readings used in personal and placenames
(for example, "iri" in the island name Iriomote is a reading unique to
placenames in the Ryukyu Islands, coming from local speech).

As far as the character "日", it actually occurs alone relatively
frequently, in the sense that it can be a stand-alone lexical unit
meaning "day". (one could argue that no character in Japanese ever
occurs alone due to the near-absence of spaces to separate words).

Mark

On 09/04/2008, Tim Starling <tstarling at wikimedia.org> wrote:
> Simetrical wrote:
>  > On Wed, Apr 9, 2008 at 9:41 AM, Tim Starling <tstarling at wikimedia.org> wrote:
>  >>  It supports custom rule sets, and Wikipedians may some day submit such a
>  >>  rule set that can transliterate Japanese and Korean names.
>  >
>  > Surely the issue with Japanese and Korean is that they use the same
>  > code points as Chinese.  The character 日 is transliterated "hi" in
>  > Japanese, but "rì" in Chinese, in each case meaning "day".  It can't
>  > possibly guess which one to translate to, so presumably it must choose
>  > one of them arbitrarily.  A custom rule set won't help that.  Of
>  > course, in some cases it's presumably not ambiguous whether a string
>  > is Chinese/Japanese/Korean; in those cases it could hypothetically
>  > make a guess and follow it consistently, which apparently it doesn't
>  > (someone gave an example above of something that was transliterated
>  > partly to romaji and partly to pinyin).
>
>
> You seem to be implying that the transliterator cannot possibly know what
>  language the name is in. MediaWiki can supply that information. It can
>  select a rule set based on the current user language.
>
>  Rules are not limited to single-character context-free conversions. Please
>  read the "Designing Transliterators" section of:
>
>  http://www.icu-project.org/userguide/Transform.html
>
>  The character "日" rarely appears alone, rather it appears as a part of
>  larger lexical units which, when identified, usually have unambiguous
>  pronunciation.
>
>
>  -- Tim Starling
>
>
>
>  _______________________________________________
>  foundation-l mailing list
>  foundation-l at lists.wikimedia.org
>  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list