There's a problem which is familiar to people who use non-Latin alphabets in computers is that they sometimes forget to switch the keyboard layout and type a whole word or even a sentence of gibberish until they notice it. For example, people who use a Cyrillic keyboard may search Google for "цшлшзувшф", when they actually meant to search for "wikipedia", and vice versa - "dbrbgtlbz" when they meant "википедия" (that's "wikipedia" in Russian). The Google search engine is aware of it for a few years now and often automatically searches for the desired term in a DWIM manner.
Wikipedia's own search engine is not aware of it yet. A user in the Hebrew Wikipedia had this idea: Maybe LanguageConverter can be used for it? Common keyboard layouts can be mapped to each other, like the two Serbian alphabet are mapped to each other today, and Special:Search is already aware of LanguageConverter.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore
strtr() is sufficient.** Current LC is based on it.
2011/6/10 Amir E. Aharoni amir.aharoni@mail.huji.ac.il
There's a problem which is familiar to people who use non-Latin alphabets in computers is that they sometimes forget to switch the keyboard layout and type a whole word or even a sentence of gibberish until they notice it. For example, people who use a Cyrillic keyboard may search Google for "цшлшзувшф", when they actually meant to search for "wikipedia", and vice versa - "dbrbgtlbz" when they meant "википедия" (that's "wikipedia" in Russian). The Google search engine is aware of it for a few years now and often automatically searches for the desired term in a DWIM manner.
Wikipedia's own search engine is not aware of it yet. A user in the Hebrew Wikipedia had this idea: Maybe LanguageConverter can be used for it? Common keyboard layouts can be mapped to each other, like the two Serbian alphabet are mapped to each other today, and Special:Search is already aware of LanguageConverter.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
OK, but if i understand correctly, LanguageConverter is already integrated with the search engine.
2011/6/10 Philip Tzou philip.npc@gmail.com:
strtr() is sufficient.** Current LC is based on it.
2011/6/10 Amir E. Aharoni amir.aharoni@mail.huji.ac.il
There's a problem which is familiar to people who use non-Latin alphabets in computers is that they sometimes forget to switch the keyboard layout and type a whole word or even a sentence of gibberish until they notice it. For example, people who use a Cyrillic keyboard may search Google for "цшлшзувшф", when they actually meant to search for "wikipedia", and vice versa - "dbrbgtlbz" when they meant "википедия" (that's "wikipedia" in Russian). The Google search engine is aware of it for a few years now and often automatically searches for the desired term in a DWIM manner.
Wikipedia's own search engine is not aware of it yet. A user in the Hebrew Wikipedia had this idea: Maybe LanguageConverter can be used for it? Common keyboard layouts can be mapped to each other, like the two Serbian alphabet are mapped to each other today, and Special:Search is already aware of LanguageConverter.
Google is not aware of this either. It works for certain queries like wikipedia (probably because many people misspell it in Cyrillic for fun), but try a more general query (e.g. университз оф оџфорд), and it won't return any results.
r.
On 10/06/11 06:46, Amir E. Aharoni wrote:
There's a problem which is familiar to people who use non-Latin alphabets in computers is that they sometimes forget to switch the keyboard layout and type a whole word or even a sentence of gibberish until they notice it. For example, people who use a Cyrillic keyboard may search Google for "цшлшзувшф", when they actually meant to search for "wikipedia", and vice versa - "dbrbgtlbz" when they meant "википедия" (that's "wikipedia" in Russian). The Google search engine is aware of it for a few years now and often automatically searches for the desired term in a DWIM manner.
Wikipedia's own search engine is not aware of it yet. A user in the Hebrew Wikipedia had this idea: Maybe LanguageConverter can be used for it? Common keyboard layouts can be mapped to each other, like the two Serbian alphabet are mapped to each other today, and Special:Search is already aware of LanguageConverter.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Jun 10, 2011 at 2:16 PM, Robert Stojnic rainmansr@gmail.com wrote:
Google is not aware of this either. It works for certain queries like wikipedia (probably because many people misspell it in Cyrillic for fun), but try a more general query (e.g. университз оф оџфорд), and it won't return any results.
He was referring to search terms typed in a wrong keyboard layout, not transliteration, i.e. "гтшмукышешуы ща щчащкв" instead of what you wrote for "universities of oxford". Google doesn't really handle this query either, by the way. But yes - I personally felt the need in such a feature and was thinking of doing it myself. Ideally, this should be made a backend-independent extension that hooks into Special:Search directly.
My query wasn't a transliteration, but typed with a Serbian Cyrillic layout.
r.
On 10/06/11 11:58, Max Semenik wrote:
He was referring to search terms typed in a wrong keyboard layout, not transliteration, i.e. "гтшмукышешуы ща щчащкв" instead of what you wrote for "universities of oxford". Google doesn't really handle this query either, by the way. But yes - I personally felt the need in such a feature and was thinking of doing it myself. Ideally, this should be made a backend-independent extension that hooks into Special:Search directly. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Jun 10, 2011 at 3:07 PM, Robert Stojnic rainmansr@gmail.com wrote:
My query wasn't a transliteration, but typed with a Serbian Cyrillic layout.
So it's phonetically equivalent to QWERTY? Neat.
On Fri, Jun 10, 2011 at 6:16 AM, Robert Stojnic rainmansr@gmail.com wrote:
Google is not aware of this either. It works for certain queries like wikipedia (probably because many people misspell it in Cyrillic for fun), but try a more general query (e.g. университз оф оџфорд), and it won't return any results.
It seems to work consistently if you type English using a Hebrew layout, even for rather uncommon search terms:
http://www.google.com/search?q=%D7%A8%D7%9D%D7%A0%D7%A7%D7%A8%D7%90+%D7%93%D...
And the reverse:
http://www.google.com/search?q=tnhr+tvrubh
Maybe it just doesn't know about Serbian Cyrillic keyboard layouts, but does know about Hebrew keyboard layouts.
Aryeh Gregor Simetrical+wikilist@gmail.com writes:
Maybe it just doesn't know about Serbian Cyrillic keyboard layouts, but does know about Hebrew keyboard layouts.
My understanding of how this worked (probably wrong) is that Google looks at what searches people perform back-to-back. So perhaps there are more people using Hebrew keyboard layouts than Serbian Cyrillic layouts?
Mark.
wikitech-l@lists.wikimedia.org