Thanks for the thorough explanation and practical suggestion again.
I would like to know if there's any traffic numbers and response time expectation for
the Parsing wikitext API,[1] considering the use case suggested[2][3] which can handle the
conversion. To be more specific, I am expecting 900 rps at peak and 15 rps in average.
Please let me know if the traffic will cause any issue or any other API more suitable for
the purpose.
Thanks again for your time and support.
-- Ben Yeh
[1]
https://www.mediawiki.org/wiki/API:Parsing_wikitext
[2]
https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=te…
https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=te…
2017年2月6日 星期一 下午4:33:23 [台北], byeh@yahoo-inc.com<byeh@yahoo-inc.com> 寫道:Thank you so
much for your kind help and clear explanation.
The use case is indeed a Chinese-language project, and the examples provided posed a nice
illustration of how three kinds of possible outcomes will show in different versions. Here
I would like to add another scenario in hope for a further understanding to
Wikimedia's language conversion. If you search for "川普" ( Donald Trump in
Traditional Chinese ) at OpenSearch API layer with redirects=resolve, [1] the first
description should be "唐納·約翰·川普(英語:Donald John
Trump,1946年6月14日-),第45任美國總統、著名企業家、作家和節目主持人。他生於紐約市皇后區,為川普集團前任董事長兼總裁及川普娛樂公司的創辦人,他在全世界經營房地產、賭場和酒店,但在就任美國總統後把集團交給他兩名兒子小唐納·川普及艾瑞克·川普管理。",
which is Traditional Chinese; however, if the profile parameter is set to restrict,[2] the
first description should become "唐纳德·约翰·特朗普(英语:Donald John
Trump,1946年6月14日-),第45任美國總統、著名企業家、作家和節目主持人。他生于紐約市皇后区,为特朗普集團前任董事长兼總裁及特朗普娱乐公司的創辦人,他在全世界经营房地产、赌场和酒店,但在就任美國總統後把集團交給他兩名兒子小唐納·川普及艾瑞克·川普管理。,
which is Simplified Chinese.
This scenario indicates that language conversion happens not just in display time, but
also at API layer. To add another interesting point, the API use cases mentioned above
[1][2] can even have various outcomes in different machines.
The ultimate issue now should be how such language conversions can happen at API layers
and how can it be controlled?
-- Ben Yeh
[1]
https://zh.wikipedia.org/w/api.php?action=opensearch&search=%E5%B7%9D%E…
[2]
https://zh.wikipedia.org/w/api.php?action=opensearch&search=%E5%B7%9D%E…
在 2017年1月26日 星期四 上午4:24:05 [台北], Trey Jones<tjones(a)wikimedia.org> 寫道:Let's see
if I can help, either directly, or indirectly via Cunningham's Law.[1]
I'm reading this as you are searching a Chinese-language project (like
zh.wikipedia.org), and getting results that are mixed Traditional and Simplified Chinese.
If that's not the case, please elaborate!
My understanding, which is admittedly incomplete, is that the text for Chinese-language
projects is stored however it was entered (Traditional or Simplified), and is converted at
display time. If you look at the main page of zh.wikipedia.org[2] today without being
logged in (or in a private browsing window), the featured article link has this text:
"2007年欧洲冠军联赛決賽", which uses both 赛 and 賽, with 赛 being the Simplified version of
Traditional 賽.[3] If you request the zh-cn version of the page,[4] the text is
"2007年欧洲冠军联赛决赛", and both are Simplified "赛". If you request the zh-tw
version of the page[5], the text is "2007年歐洲冠軍聯賽決賽", and both are Traditional
"賽". So, I believe that explains why you are seeing mixed Traditional and
Simplified results.
What to do about it? I can't get the Opensearch API to do the conversion in place, but
there is a separate API that does the conversion: Parsing wikitext.[6] Unfortunately, I
can only get the API to do the conversion (which is based on the uselang parameter) when I
submit the text as wikitext,[7][8] which adds some additional tags and a long comment to
the results. \u-formatted input doesn't work, and I can't get the conversion to
work for json input (i.e., the result of the Opensearch call). That doesn't mean it
isn't possible, just that I haven't figured it out.
I hope that points you in the right direction, and maybe inspires someone who knows this
stuff better than me to help out.
—Trey
[1]
https://meta.wikimedia.org/wiki/Cunningham's_Law[2] https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5[3] https://en.wiktionary.org/wiki/%E8%B5%9B[4] https://zh.wikipedia.org/zh-cn/Wikipedia:%E9%A6%96%E9%A1%B5[5] https://zh.wikipedia.org/zh-tw/Wikipedia:%E9%A6%96%E9%A1%B5[6] https://www.mediawiki.org/wiki/API:Parsing_wikitext[7]
https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=te…
[8]
https://zh.wikipedia.org/w/api.php?action=parse&format=json&prop=te…
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Wed, Jan 25, 2017 at 11:22 AM, Adam Baso <abaso(a)wikimedia.org> wrote:
+discovery list
On Wed, Jan 25, 2017 at 10:15 AM, Brad Jorsch (Anomie) <bjorsch(a)wikimedia.org>
wrote:
On Wed, Jan 25, 2017 at 2:09 AM, <byeh(a)yahoo-inc.com> wrote:
While I was developing some services based on API:Opensearch, I found that the response of
the same url request can be either Simplified Chinese or Traditional Chinese. To be more
specific, I would love to know how can I determine the response language form from API
layer ( Or other factors that may have impact ) ? Since the document of API:Opensearch
doesn't seem to take language into consideration,
The OpenSearch Suggestions extension specification[1] does not allow for returning
additional metadata such as language with the response. You may want to look at the
prefixsearch query module[2] instead which allows for returning the same results in a
different format, although I don't know the details of how language variants are
handled in the search output.
[1]:
http://www.opensearch.org/Spec ifications/OpenSearch/Extensio ns/Suggestions/1.1
[2]:
https://www.mediawiki.org/wiki /API:Prefixsearch
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
______________________________ _________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia. org
https://lists.wikimedia.org/ma ilman/listinfo/mediawiki-api
______________________________ _________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/ mailman/listinfo/discovery