Hi,
I wondered about some things around the Chinese variant conversion:
* When a person uses a search engine, do the links in the results point directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking people Google is not necessarily as ubiquitous as elsewhere, so there is probably a separate answer for each search engine.
* If for any search engine the answer above is "yes", does anybody have an idea about how does that search engine guess the preferred variant? Usage of simplified / traditional characters in the search query? Geolocation? Preferred language settings in the browser ("Accept-Language")? Preferences in the search engine itself? A combination of all of the above? Something else?
* Does any of the search engine show direct links to country-based variants - zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and zh-hant?
* For users who didn't log in, is the variant selection remembered in a cookie or in localStorage?
I cannot easily test any of these things myself, because I don't speak Chinese, I'm not familiar with Chinese search engines, and I don't live in a Chinese-speaking country (and geolocation matters). But since I care about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
I think the point of all those <link rel=alternate hreflang=foo> tags was so google linked to right variant, but i am unsure.
-- brian
On Wednesday, May 17, 2017, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Hi,
I wondered about some things around the Chinese variant conversion:
- When a person uses a search engine, do the links in the results point
directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking people Google is not necessarily as ubiquitous as elsewhere, so there is probably a separate answer for each search engine.
- If for any search engine the answer above is "yes", does anybody have an
idea about how does that search engine guess the preferred variant? Usage of simplified / traditional characters in the search query? Geolocation? Preferred language settings in the browser ("Accept-Language")?
Preferences
in the search engine itself? A combination of all of the above? Something else?
- Does any of the search engine show direct links to country-based
variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
zh-hant?
- For users who didn't log in, is the variant selection remembered in a
cookie or in localStorage?
I cannot easily test any of these things myself, because I don't speak Chinese, I'm not familiar with Chinese search engines, and I don't live in a Chinese-speaking country (and geolocation matters). But since I care about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Quite possible, but does it actually happen?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2017-05-17 23:25 GMT+03:00 Brian Wolff bawolff@gmail.com:
I think the point of all those <link rel=alternate hreflang=foo> tags was so google linked to right variant, but i am unsure.
-- brian
On Wednesday, May 17, 2017, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Hi,
I wondered about some things around the Chinese variant conversion:
- When a person uses a search engine, do the links in the results point
directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking people Google is not necessarily as ubiquitous as elsewhere, so there is probably a separate answer for each search engine.
- If for any search engine the answer above is "yes", does anybody have
an
idea about how does that search engine guess the preferred variant? Usage of simplified / traditional characters in the search query? Geolocation? Preferred language settings in the browser ("Accept-Language")?
Preferences
in the search engine itself? A combination of all of the above? Something else?
- Does any of the search engine show direct links to country-based
variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
zh-hant?
- For users who didn't log in, is the variant selection remembered in a
cookie or in localStorage?
I cannot easily test any of these things myself, because I don't speak Chinese, I'm not familiar with Chinese search engines, and I don't live
in
a Chinese-speaking country (and geolocation matters). But since I care about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Supposedly google listens: https://support.google.com/webmasters/answer/189077?hl=en
On Wednesday, May 17, 2017, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Quite possible, but does it actually happen?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2017-05-17 23:25 GMT+03:00 Brian Wolff bawolff@gmail.com:
I think the point of all those <link rel=alternate hreflang=foo> tags was so google linked to right variant, but i am unsure.
-- brian
On Wednesday, May 17, 2017, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il
wrote:
Hi,
I wondered about some things around the Chinese variant conversion:
- When a person uses a search engine, do the links in the results point
directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among
Chinese-speaking
people Google is not necessarily as ubiquitous as elsewhere, so there
is
probably a separate answer for each search engine.
- If for any search engine the answer above is "yes", does anybody have
an
idea about how does that search engine guess the preferred variant?
Usage
of simplified / traditional characters in the search query?
Geolocation?
Preferred language settings in the browser ("Accept-Language")?
Preferences
in the search engine itself? A combination of all of the above?
Something
else?
- Does any of the search engine show direct links to country-based
variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
zh-hant?
- For users who didn't log in, is the variant selection remembered in a
cookie or in localStorage?
I cannot easily test any of these things myself, because I don't speak Chinese, I'm not familiar with Chinese search engines, and I don't live
in
a Chinese-speaking country (and geolocation matters). But since I care about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Yeah... but I would love to hear actual experiences of actual Chinese speakers. (Although I certainly do appreciate other relevant replies, like yours.)
For every search engine, and often for every search engine _user_ the actual results are likely different.
If it works well for Chinese speakers—good to know.
If it doesn't, then I'd like to know it, and to think whether we can do anything about it in our software.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2017-05-18 2:35 GMT+03:00 Brian Wolff bawolff@gmail.com:
Supposedly google listens: https://support.google.com/webmasters/answer/189077?hl=en
On Wednesday, May 17, 2017, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Quite possible, but does it actually happen?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2017-05-17 23:25 GMT+03:00 Brian Wolff bawolff@gmail.com:
I think the point of all those <link rel=alternate hreflang=foo> tags
was
so google linked to right variant, but i am unsure.
-- brian
On Wednesday, May 17, 2017, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il
wrote:
Hi,
I wondered about some things around the Chinese variant conversion:
- When a person uses a search engine, do the links in the results
point
directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among
Chinese-speaking
people Google is not necessarily as ubiquitous as elsewhere, so there
is
probably a separate answer for each search engine.
- If for any search engine the answer above is "yes", does anybody
have
an
idea about how does that search engine guess the preferred variant?
Usage
of simplified / traditional characters in the search query?
Geolocation?
Preferred language settings in the browser ("Accept-Language")?
Preferences
in the search engine itself? A combination of all of the above?
Something
else?
- Does any of the search engine show direct links to country-based
variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans
and
zh-hant?
- For users who didn't log in, is the variant selection remembered in
a
cookie or in localStorage?
I cannot easily test any of these things myself, because I don't speak Chinese, I'm not familiar with Chinese search engines, and I don't
live
in
a Chinese-speaking country (and geolocation matters). But since I care about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
- For users who didn't log in, is the variant selection remembered in a
cookie or in localStorage?
A quick test when I'm not logged in indicates that variant selection isn't remembered between searches, or even when following links! The default "/wiki/" version of the page can be mixed Traditional and Simplified characters (which seems maximally unhelpful).
If it doesn't, then I'd like to know it, and to think whether we can do
anything about it in our software.
Discovery is almost ready to deploy changes to improve searching on Chinese wikis. We are using an Elastic plugin to convert Traditional Chinese characters to Simplified, and another plugin that does a better job of segmenting Simplified text into words (the plugin only works on Simplified text—which is why we haven't used it in the past: it's pretty bad on Traditional text).
So, searching for Simplified or Traditional text will find the other (modulo the imperfections in the software—but it's definitely a big improvement). "Exact" matches are still weighted some, so Traditional and Simplified variants don't always return the exact same results, but they will be much more similar compared to the current situation, where they don't necessarily overlap at all.
The changes should go out after the Hackathon (and possibly some vacations tacked on to the Hackathon), and then we'll have to re-index Chinese wikis to make them live. So, not yet—but soon!
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Thu, May 18, 2017 at 1:41 AM, Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
Yeah... but I would love to hear actual experiences of actual Chinese speakers. (Although I certainly do appreciate other relevant replies, like yours.)
For every search engine, and often for every search engine _user_ the actual results are likely different.
If it works well for Chinese speakers—good to know.
If it doesn't, then I'd like to know it, and to think whether we can do anything about it in our software.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2017-05-18 2:35 GMT+03:00 Brian Wolff bawolff@gmail.com:
Supposedly google listens: https://support.google.com/webmasters/answer/189077?hl=en
On Wednesday, May 17, 2017, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il>
wrote:
Quite possible, but does it actually happen?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2017-05-17 23:25 GMT+03:00 Brian Wolff bawolff@gmail.com:
I think the point of all those <link rel=alternate hreflang=foo> tags
was
so google linked to right variant, but i am unsure.
-- brian
On Wednesday, May 17, 2017, Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il
wrote:
Hi,
I wondered about some things around the Chinese variant conversion:
- When a person uses a search engine, do the links in the results
point
directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among
Chinese-speaking
people Google is not necessarily as ubiquitous as elsewhere, so
there
is
probably a separate answer for each search engine.
- If for any search engine the answer above is "yes", does anybody
have
an
idea about how does that search engine guess the preferred variant?
Usage
of simplified / traditional characters in the search query?
Geolocation?
Preferred language settings in the browser ("Accept-Language")?
Preferences
in the search engine itself? A combination of all of the above?
Something
else?
- Does any of the search engine show direct links to country-based
variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans
and
zh-hant?
- For users who didn't log in, is the variant selection remembered
in
a
cookie or in localStorage?
I cannot easily test any of these things myself, because I don't
speak
Chinese, I'm not familiar with Chinese search engines, and I don't
live
in
a Chinese-speaking country (and geolocation matters). But since I
care
about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
David or Liangent might be able to answer your questions. --scott
On May 17, 2017 12:01 PM, "Amir E. Aharoni" amir.aharoni@mail.huji.ac.il wrote:
Hi,
I wondered about some things around the Chinese variant conversion:
- When a person uses a search engine, do the links in the results point
directly to one of the variants? That is, does it point to https://zh.wikipedia.org/zh-cn/Article_name , etc., or simply to zh.wikipedia.org/wiki/Article_name ? I guess that among Chinese-speaking people Google is not necessarily as ubiquitous as elsewhere, so there is probably a separate answer for each search engine.
- If for any search engine the answer above is "yes", does anybody have an
idea about how does that search engine guess the preferred variant? Usage of simplified / traditional characters in the search query? Geolocation? Preferred language settings in the browser ("Accept-Language")? Preferences in the search engine itself? A combination of all of the above? Something else?
- Does any of the search engine show direct links to country-based variants
- zh-cn, zh-hk, zh-tw, zh-sg, zh-mo? Or to the more generic zh-hans and
zh-hant?
- For users who didn't log in, is the variant selection remembered in a
cookie or in localStorage?
I cannot easily test any of these things myself, because I don't speak Chinese, I'm not familiar with Chinese search engines, and I don't live in a Chinese-speaking country (and geolocation matters). But since I care about language, I'm very curious about this.
Thanks!
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Indexing is a known issue, tracked at: * https://phabricator.wikimedia.org/T93213 * https://phabricator.wikimedia.org/T54429
(Tilman, Kaldari or others with Google Search Console access may quickly provide an update on https://phabricator.wikimedia.org/T93213#2518417 .)
Nemo
wikitech-l@lists.wikimedia.org