Re: [Wikitech-ambassadors] Lua script that needs to look up a big table (phonetic guide automation)

16 Mar 2020


      bawolff - Would you be able to point me to an example of mw.loadData?
Also, I've subscribed to https://phabricator.wikimedia.org/T46667 .
Huji - I was inspired by Japanese Wikipedia's approach to sorting - they
have a {{DEFAULTSORT:[article name in hiragana]}} on all articles. Since
Cantonese pronunciation is even more predictable than Japanese, we could
potentially have a template that automatically adds {{DEFAULTSORT:[article
title in Jyutping]}} using a Lua lookup table of all common Chinese
characters. Exceptional pronunciations should then be coded individually.
The Pinyin implementation of this would be equivalent, though it would
depend on the zh.wp community agreeing on sorting things by Pinyin.
In terms of storing the data, Wikidata is not a good answer. First up, the
Wikidata property creators community has rejected the notion of creating
separate properties for each common phonetic transcription system of CJK
languages, so the retrieval of the phonetic transcriptions from Jyutping
will be unnecessarily complicated. Second, Wikidata items refer to
concepts, not titles. We could theoretically ask the script to go to
Lexemes to fetch the phonetic transcription but that'll involve untangling
the multiple Lexemes that refer to the same Chinese character. In general,
the way Wikidata is structured makes it a bad fit for the problem at hand.
Liangent's formulation of the problem is more general than the one I
described, because T46667 aims to allow multiple ways of sorting Chinese
characters within the same interface. That will be much welcome too.
On Sun, 15 Mar 2020 at 19:55, bawolff bawolff+wn@gmail.com wrote:
...
Consider using
https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#mw.l...
, keeping in mind that lua isn't really made with the usecase of huge data
tables in mind, so there might be limits you run into if your data is
really big.
--
Bawolff
On Sun, Mar 15, 2020 at 2:13 PM Deryck Chan deryckchan@gmail.com wrote:
...
Hello Ambassadors - This technical question may be relevant to multiple
(particularly CJK) language communities so I'm asking it here.
What is the advice for writing a Lua script that needs to look up data
from a big table (~10k rows at first deployment, potentially increasing in
the future)? Does one hard-code the data into a Lua script, or is there a
recommended data structure for storing those?
The design problem at hand is that the Cantonese Wikipedia wants to
re-sort articles by Jyutping rather than Unicode. This will probably
involve automating the generation of Jyutping phonetic guides by looking up
the Jyutping transcription of common Chinese characters using a Lua module.
Where do we store the data?
If another wiki has done similar things, we'd be interested in sharing
the infrastructure.
Deryck
On behalf of the Cantonese Wikipedia community

Wikitech-ambassadors mailing list
Wikitech-ambassadors@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors

Wikitech-ambassadors mailing list
Wikitech-ambassadors@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Wikitech-ambassadors] Lua script that needs to look up a big table (phonetic guide automation)