Thanks for the thorough explanation and practical suggestion again.

I would like to know if there's any traffic numbers and response time expectation for the Parsing wikitext API,[1] considering the use case suggested[2][3] which can handle the conversion. To be more specific, I am expecting 900 rps at peak and 15 rps in average. Please let me know if the traffic will cause any issue or any other API more suitable for the purpose.

Thanks again for your time and support.

-- Ben Yeh

在 2017年2月6日 星期一 下午4:33:23 [台北], byeh@yahoo-inc.com<byeh@yahoo-inc.com> 寫道:
Thank you so much for your kind help and clear explanation. 

The use case is indeed a Chinese-language project, and the examples provided posed a nice illustration of how three kinds of possible outcomes will show in different versions. Here I would like to add another scenario in hope for a further understanding to Wikimedia's language conversion. If you search for "川普" ( Donald Trump in Traditional Chinese ) at OpenSearch API layer with redirects=resolve, [1] the first description should be "唐納·約翰·川普(英語:Donald John Trump,1946年6月14日-),第45任美國總統、著名企業家、作家和節目主持人。他生於紐約市皇后區,為川普集團前任董事長兼總裁及川普娛樂公司的創辦人,他在全世界經營房地產、賭場和酒店,但在就任美國總統後把集團交給他兩名兒子小唐納·川普及艾瑞克·川普管理。", which is Traditional Chinese; however, if the profile parameter is set to restrict,[2] the first description should become "唐纳德·约翰·特朗普(英语:Donald John Trump,1946年6月14日-),第45任美國總統、著名企業家、作家和節目主持人。他生于紐約市皇后区,为特朗普集團前任董事长兼總裁及特朗普娱乐公司的創辦人,他在全世界经营房地产、赌场和酒店,但在就任美國總統後把集團交給他兩名兒子小唐納·川普及艾瑞克·川普管理。, which is Simplified Chinese. 

This scenario indicates that language conversion happens not just in display time, but also at API layer. To add another interesting point, the API use cases mentioned above [1][2] can even have various outcomes in different machines. 

The ultimate issue now should be how such language conversions can happen at API layers and how can it be controlled?

-- Ben Yeh


在 2017年1月26日 星期四 上午4:24:05 [台北], Trey Jones<tjones@wikimedia.org> 寫道:
Let's see if I can help, either directly, or indirectly via Cunningham's Law.[1]

I'm reading this as you are searching a Chinese-language project (like zh.wikipedia.org), and getting results that are mixed Traditional and Simplified Chinese. If that's not the case, please elaborate!

My understanding, which is admittedly incomplete, is that the text for Chinese-language projects is stored however it was entered (Traditional or Simplified), and is converted at display time. If you look at the main page of zh.wikipedia.org[2] today without being logged in (or in a private browsing window), the featured article link has this text: "2007年欧洲冠军联赛決賽", which uses both 赛 and 賽, with 赛 being the Simplified version of Traditional 賽.[3] If you request the zh-cn version of the page,[4] the text is "2007年欧洲冠军联赛决赛", and both are Simplified "赛". If you request the zh-tw version of the page[5], the text is "2007年歐洲冠軍聯賽決賽", and both are Traditional "賽". So, I believe that explains why you are seeing mixed Traditional and Simplified results.

What to do about it? I can't get the Opensearch API to do the conversion in place, but there is a separate API that does the conversion: Parsing wikitext.[6] Unfortunately, I can only get the API to do the conversion (which is based on the uselang parameter) when I submit the text as wikitext,[7][8] which adds some additional tags and a long comment to the results. \u-formatted input doesn't work, and I can't get the conversion to work for json input (i.e., the result of the Opensearch call). That doesn't mean it isn't possible, just that I haven't figured it out.

I hope that points you in the right direction, and maybe inspires someone who knows this stuff better than me to help out.

—Trey



Trey Jones
Software Engineer, Discovery
Wikimedia Foundation


On Wed, Jan 25, 2017 at 11:22 AM, Adam Baso <abaso@wikimedia.org> wrote:
+discovery list

On Wed, Jan 25, 2017 at 10:15 AM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org> wrote:
On Wed, Jan 25, 2017 at 2:09 AM, <byeh@yahoo-inc.com> wrote:
While I was developing some services based on API:Opensearch, I found that the response of the same url request can be either Simplified Chinese or Traditional Chinese. To be more specific, I would love to know how can I determine the response language form from API layer ( Or other factors that may have impact ) ? Since the document of API:Opensearch doesn't seem to take language into consideration,

The OpenSearch Suggestions extension specification[1] does not allow for returning additional metadata such as language with the response. You may want to look at the prefixsearch query module[2] instead which allows for returning the same results in a different format, although I don't know the details of how language variants are handled in the search output.


 [1]: http://www.opensearch.org/Spec ifications/OpenSearch/Extensio ns/Suggestions/1.1
 [2]: https://www.mediawiki.org/wiki /API:Prefixsearch

--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation

______________________________ _________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia. org
https://lists.wikimedia.org/ma ilman/listinfo/mediawiki-api



______________________________ _________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/ mailman/listinfo/discovery