+discovery list
On Wed, Jan 25, 2017 at 10:15 AM, Brad Jorsch (Anomie) < bjorsch@wikimedia.org> wrote:
On Wed, Jan 25, 2017 at 2:09 AM, byeh@yahoo-inc.com wrote:
While I was developing some services based on API:Opensearch, I found that the response of the same url request can be either Simplified Chinese or Traditional Chinese. To be more specific, I would love to know how can I determine the response language form from API layer ( Or other factors that may have impact ) ? Since the document of API:Opensearch doesn't seem to take language into consideration,
The OpenSearch Suggestions extension specification[1] does not allow for returning additional metadata such as language with the response. You may want to look at the prefixsearch query module[2] instead which allows for returning the same results in a different format, although I don't know the details of how language variants are handled in the search output.
Extensions/Suggestions/1.1 [2]: https://www.mediawiki.org/wiki/API:Prefixsearch
-- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Let's see if I can help, either directly, or indirectly via Cunningham's Law.[1]
I'm reading this as you are searching a Chinese-language project (like zh.wikipedia.org), and getting results that are mixed Traditional and Simplified Chinese. If that's not the case, please elaborate!
My understanding, which is admittedly incomplete, is that the text for Chinese-language projects is stored however it was entered (Traditional or Simplified), and is converted at display time. If you look at the main page of zh.wikipedia.org[2] today without being logged in (or in a private browsing window), the featured article link has this text: "2007年欧洲冠军联赛決賽", which uses both 赛 and 賽, with 赛 being the Simplified version of Traditional 賽.[3] If you request the zh-cn version of the page,[4] the text is "2007年欧洲冠军联赛决赛", and both are Simplified "赛". If you request the zh-tw version of the page[5], the text is "2007年歐洲冠軍聯賽決賽", and both are Traditional "賽". So, I believe that explains why you are seeing mixed Traditional and Simplified results.
What to do about it? I can't get the Opensearch API to do the conversion in place, but there is a separate API that does the conversion: Parsing wikitext.[6] Unfortunately, I can only get the API to do the conversion (which is based on the uselang parameter) when I submit the text as wikitext,[7][8] which adds some additional tags and a long comment to the results. \u-formatted input doesn't work, and I can't get the conversion to work for json input (i.e., the result of the Opensearch call). That doesn't mean it isn't possible, just that I haven't figured it out.
I hope that points you in the right direction, and maybe inspires someone who knows this stuff better than me to help out.
—Trey
[1] https://meta.wikimedia.org/wiki/Cunningham%27s_Law [2] https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5 [3] https://en.wiktionary.org/wiki/%E8%B5%9B [4] https://zh.wikipedia.org/zh-cn/Wikipedia:%E9%A6%96%E9%A1%B5 [5] https://zh.wikipedia.org/zh-tw/Wikipedia:%E9%A6%96%E9%A1%B5 [6] https://www.mediawiki.org/wiki/API:Parsing_wikitext [7] https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=tex... [8] https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=tex...
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Wed, Jan 25, 2017 at 11:22 AM, Adam Baso abaso@wikimedia.org wrote:
+discovery list
On Wed, Jan 25, 2017 at 10:15 AM, Brad Jorsch (Anomie) < bjorsch@wikimedia.org> wrote:
On Wed, Jan 25, 2017 at 2:09 AM, byeh@yahoo-inc.com wrote:
While I was developing some services based on API:Opensearch, I found that the response of the same url request can be either Simplified Chinese or Traditional Chinese. To be more specific, I would love to know how can I determine the response language form from API layer ( Or other factors that may have impact ) ? Since the document of API:Opensearch doesn't seem to take language into consideration,
The OpenSearch Suggestions extension specification[1] does not allow for returning additional metadata such as language with the response. You may want to look at the prefixsearch query module[2] instead which allows for returning the same results in a different format, although I don't know the details of how language variants are handled in the search output.
ns/Suggestions/1.1 [2]: https://www.mediawiki.org/wiki/API:Prefixsearch
-- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery