[WikiJA-l] title searching

Hatukanezumi hnezumi @ gmail.com
2010年 1月 27日 (水) 13:48:40 UTC


Sean,

2010年1月27日16:46 Sean Moss-Pultz <sean @ openmoko.com>:
> Hi List!
>
> This is Sean from Openmoko. We developed a product called WikiReader and
> would like to add a Japanese interface and input method.
>
> We're not sure how to properly search over the (Japanese) Wikipedia titles.
> It seems like most (but not all?) articles have the hiragana tags. Like:
>
>  {{DEFAULTSORT:ひかしろまていこく}} for the article 東ローマ帝国
>
> We need these to be able to generate our search index. (We don't have any
> ideas of other methods). Are these complete? Or if not, is there another
> method that somebody recommends?

It is ``sort key'', not standing for proper reading of 東ローマ帝国: this title
shall be read as ひがしろーまていこく /hi ga shi ro o ma te i ko ku/
while this key may be read as /hi ka shi ro ma te i ko ku/.  In addition,
this key is simplified to ease confusion of editors over multiple sorting
methods used for Japanese language.

I suppose it is insufficient to be used for search index.  It cannot stand
for proper reading. Moreover, Japanese words occasionally don't have
obvious  reading by nature of kanzi (漢字) used in Japanese text.

I'd like to recommend you to communicate with developers of japanese
text processing and mobile or palmtop environments.  As early as 1970s,
number of efficient input methods and searching techniques for Japanese
text are proposed and developed (of course some of them are open-source!).

For more informations, please visit one of such Japanese communities.

[Summary in Japanese] それはソートキーで、読みを正確に表して
いません。あと、漢字の読みがわからないとどうしようもありません。
日本語のテキスト処理や移動体端末、パームトップ端末なんかの開発者
に相談してみてください。昔からいろんな方法が提案され、開発されて
ます。オープンソースのもあるし。

Regards,

> Thanks in advance for the help!
>
>  -Sean
> _______________________________________________
> https://lists.wikimedia.org/mailman/listinfo/wikija-l
>

-- 
--- Hatukanezumi



WikiJA-l メーリングリストの案内