Hi List!
This is Sean from Openmoko. We developed a product called WikiReader and would like to add a Japanese interface and input method.
We're not sure how to properly search over the (Japanese) Wikipedia titles. It seems like most (but not all?) articles have the hiragana tags. Like:
{{DEFAULTSORT:ひかしろまていこく}} for the article 東ローマ帝国
We need these to be able to generate our search index. (We don't have any ideas of other methods). Are these complete? Or if not, is there another method that somebody recommends?
Thanks in advance for the help!
-Sean
Hi, Sean
I guess that those hiragana tags related to the Japanese Wikipedia titles are complete, because I have never seen some areticle without those recently. So, it should be good to add those to your WikiReader stuffs. But, please don't forget that this is just my opinion, and there should be a counterpart from other guys. I recommend that you hear from other Japanese users, and make the appropriate decision.
Regards,
Ryota Tanaka
--- Sean Moss-Pultz wrote:
Hi List!
This is Sean from Openmoko. We developed a product called WikiReader and would like to add a Japanese interface and input method.
We're not sure how to properly search over the (Japanese) Wikipedia titles. It seems like most (but not all?) articles have the hiragana tags. Like:
{{DEFAULTSORT:ひかしろまていこく}} for the article 東ローマ帝国
We need these to be able to generate our search index. (We don't have any ideas of other methods). Are these complete? Or if not, is there another method that somebody recommends?
Thanks in advance for the help!
-Sean _______________________________________________ https://lists.wikimedia.org/mailman/listinfo/wikija-l
早稲田大学スポーツ科学部3年 田中 亮多 E-mail:gimme_the_microphone@yahoo.co.jp Tel:090-5820-7385
「日本のスポーツ文化を変革する!」 Sports Of Japan→ http://soj-net.com/
-------------------------------------- Get the new Internet Explorer 8 optimized for Yahoo! JAPAN http://pr.mail.yahoo.co.jp/ie8/
Sean,
2010年1月27日16:46 Sean Moss-Pultz sean@openmoko.com:
Hi List!
This is Sean from Openmoko. We developed a product called WikiReader and would like to add a Japanese interface and input method.
We're not sure how to properly search over the (Japanese) Wikipedia titles. It seems like most (but not all?) articles have the hiragana tags. Like:
{{DEFAULTSORT:ひかしろまていこく}} for the article 東ローマ帝国
We need these to be able to generate our search index. (We don't have any ideas of other methods). Are these complete? Or if not, is there another method that somebody recommends?
It is ``sort key'', not standing for proper reading of 東ローマ帝国: this title shall be read as ひがしろーまていこく /hi ga shi ro o ma te i ko ku/ while this key may be read as /hi ka shi ro ma te i ko ku/. In addition, this key is simplified to ease confusion of editors over multiple sorting methods used for Japanese language.
I suppose it is insufficient to be used for search index. It cannot stand for proper reading. Moreover, Japanese words occasionally don't have obvious reading by nature of kanzi (漢字) used in Japanese text.
I'd like to recommend you to communicate with developers of japanese text processing and mobile or palmtop environments. As early as 1970s, number of efficient input methods and searching techniques for Japanese text are proposed and developed (of course some of them are open-source!).
For more informations, please visit one of such Japanese communities.
[Summary in Japanese] それはソートキーで、読みを正確に表して いません。あと、漢字の読みがわからないとどうしようもありません。 日本語のテキスト処理や移動体端末、パームトップ端末なんかの開発者 に相談してみてください。昔からいろんな方法が提案され、開発されて ます。オープンソースのもあるし。
Regards,
Thanks in advance for the help!
-Sean _______________________________________________ https://lists.wikimedia.org/mailman/listinfo/wikija-l
Hi Sean,
First of all, let me second what Hatukanezumi have said as a general reminder. This issue is really involved and I cannot come up with any easy automatic solution. That being said, most of the non-stub articles in Japanese Wikipedia contain the pronunciation of the title (when it contains Kanji) right after it is introduced in the body, the style recommended by ja-wp MOS. For example 「東ローマ帝国」 starts with
'''東ローマ帝国'''(ひがしローマていこく、[[395年]] - [[1453年]])は、 ...
i.e. the title in bold, then its pronunciation in Hiragana and Katakana followed by supplementary information enclosed in parenthesis. You still need to take care of abbreviation of Hiragana/Katakana part like "(-すう、英: Achilles number)" in 「 アキレス数」 and others.
cheers, Makoto Yamashita
2010/1/27 Hatukanezumi hnezumi@gmail.com:
Sean,
2010年1月27日16:46 Sean Moss-Pultz sean@openmoko.com:
Hi List!
This is Sean from Openmoko. We developed a product called WikiReader and would like to add a Japanese interface and input method.
We're not sure how to properly search over the (Japanese) Wikipedia titles. It seems like most (but not all?) articles have the hiragana tags. Like:
{{DEFAULTSORT:ひかしろまていこく}} for the article 東ローマ帝国
We need these to be able to generate our search index. (We don't have any ideas of other methods). Are these complete? Or if not, is there another method that somebody recommends?
It is ``sort key'', not standing for proper reading of 東ローマ帝国: this title shall be read as ひがしろーまていこく /hi ga shi ro o ma te i ko ku/ while this key may be read as /hi ka shi ro ma te i ko ku/. In addition, this key is simplified to ease confusion of editors over multiple sorting methods used for Japanese language.
I suppose it is insufficient to be used for search index. It cannot stand for proper reading. Moreover, Japanese words occasionally don't have obvious reading by nature of kanzi (漢字) used in Japanese text.
I'd like to recommend you to communicate with developers of japanese text processing and mobile or palmtop environments. As early as 1970s, number of efficient input methods and searching techniques for Japanese text are proposed and developed (of course some of them are open-source!).
For more informations, please visit one of such Japanese communities.
[Summary in Japanese] それはソートキーで、読みを正確に表して いません。あと、漢字の読みがわからないとどうしようもありません。 日本語のテキスト処理や移動体端末、パームトップ端末なんかの開発者 に相談してみてください。昔からいろんな方法が提案され、開発されて ます。オープンソースのもあるし。
Regards,
Thanks in advance for the help!
-Sean _______________________________________________ https://lists.wikimedia.org/mailman/listinfo/wikija-l
-- --- Hatukanezumi