Le 19/09/2017 à 23:47, Trey Jones a écrit :
We recently got a suggestion via Phabricator[1] to
automatically map
between hiragana and katakana when searching on English Wikipedia and other
wiki projects. As an always-on feature, this isn't difficult to implement,
but major commercial search engines (Google.jp, Bing, Yahoo Japan,
DuckDuckGo, Goo) don't do that. They give different results when searching
for hiragana/katakana forms (for example, オオカミ/おおかみ "wolf"). They also give
different *numbers* of results, seeming to indicate that it's not just
re-ordering the same results (say, so that results in the same script are
ranked higher).[2] I want to know what they know that I don't!
Does anyone have any thoughts on whether this would be useful (seems that
it would) and whether it would cause any problems (it must, or otherwise
all the other search engines would do it, right?).
Well, maybe. Or not. Look how
Duckduckgo continue to only give a
"country" option to filter *languages*. Now both might be complementary,
but personally I'm generally more interested with the later. All the
more when
I'm using a language which have no country using it as official language. :)
Anyway, would it be a big deal to show the transliterated results with less
weight in ranking? Actually, add an option button in advanced search in any
case, and just limit discussion about should it be opt-in or opt-out.
Any idea why it might be different between a Japanese-language wiki and a
non-Japanese-language wiki? We often are more aggressive in matching
between characters that are not native to a given language--for example,
accents on Latin characters are generally ignored on English-language
wikis. So it might make sense to merge hiragana and katakana on
English-language wikis but not Japanese-language wikis.
Thanks very much for any suggestions or information!
—Trey
どういたしました。
[1]
https://phabricator.wikimedia.org/T176197
[2] Details of my tests at
https://phabricator.wikimedia.org/T173650#3580309
Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l