[I'm sorry if it's not the place to ask, please forward where it should be.]
Hi all,
There is a long frozen idea: to make a transliterator for Crimean Tatar Wikipedia. Native speakers of crh use both cyrillic and latin script depending on the country they used to live in. One example of similar thing in use is https://kk.wikipedia.org — one can choose in what script they see the content.
There is an old task on Phabricator and were attempts to write a tool in php but the effort stopped. https://phabricator.wikimedia.org/T23582 https://phabricator.wikimedia.org/T23582
Maybe someone can/wants to help with this tool or create one from scratch? Maybe you know where else I can find help?
Thanks! *--* *Vira Motorko* project manager, Wikimedia Ukraine https://ua.wikimedia.org/ non-profit organisation m: +380667740499 | f: vira.motorko https://www.facebook.com/vira.motorko | w: Ата https://meta.wikimedia.org/wiki/User:Ата
Are you saving your documents in free formats? ;) Help save natural resources – please think twice before printing this e-mail or any attachments.
It looks like a lot of the pieces needed to make this happen are out there.
Unfortunately it doesn't look like a one-to-one transliteration based on the description in English Wikipedia.[1] But when is language ever straightforward?
It looks like much of the work to deal with all the contextual variation and the exceptions to the transliteration was at least attempted twice. There's a zip file of code attached to the Phab Ticket,[2] and link to some code on-wiki[5]. From the comments, it looks like that code never quite worked, but it seems possible to harvest the conversion data from one or both and put it into the same format as the other existing language converters, like Kazakh[3]—and it *might* be easier this time since it's been 6.5 years and the LanguageConverter code is probably more mature now.
It would be even better if someone could create an Elasticsearch plugin to do the same kind of conversion. That would allow cross-alphabet searching, too. I've been working with a plugin[4] that does that kind of thing for Traditional and Simplified Chinese.
—Trey
[1] https://en.wikipedia.org/wiki/Crimean_Tatar_alphabet#Cyrillic_to_Latin_trans... [2] https://phabricator.wikimedia.org/T23582#247642 [3] https://doc.wikimedia.org/mediawiki-core/master/php/classKkConverter.html [4] https://github.com/medcl/elasticsearch-analysis-stconvert [5] https://phabricator.wikimedia.org/T23582#247634
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Fri, Mar 24, 2017 at 5:40 AM, Vira Motorko vira.motorko@gmail.com wrote:
[I'm sorry if it's not the place to ask, please forward where it should be.]
Hi all,
There is a long frozen idea: to make a transliterator for Crimean Tatar Wikipedia. Native speakers of crh use both cyrillic and latin script depending on the country they used to live in. One example of similar thing in use is https://kk.wikipedia.org — one can choose in what script they see the content.
There is an old task on Phabricator and were attempts to write a tool in php but the effort stopped. https://phabricator.wikimedia.org/T23582 https://phabricator.wikimedia.org/T23582
Maybe someone can/wants to help with this tool or create one from scratch? Maybe you know where else I can find help?
Thanks! *--* *Vira Motorko* project manager, Wikimedia Ukraine https://ua.wikimedia.org/ non-profit organisation m: +380667740499 | f: vira.motorko https://www.facebook.com/vira.motorko | w: Ата https://meta.wikimedia.org/wiki/User:Ата
Are you saving your documents in free formats? ;) Help save natural resources – please think twice before printing this e-mail or any attachments. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thank you, Trey, for this analysis!
There is also this page with transliteration table (in case it helps and not confuses more)) https://crh.wikipedia.org/wiki/Vikipediya:%C4%B0ml%C3%A2/Latin_elifbesi
I asked user:Bunyk (Ukrainian Wikimedian) for help, but he is not working with php. He is attending Vienna Hakathon though so one can reach him there if feel like doing this.
I feel uncomfortable because I want this transliteration instrument to exist but can contribute to the code myself (( So please, my volunteer hero, appear! *--* *Vira Motorko* project manager, Wikimedia Ukraine https://ua.wikimedia.org/ non-profit organisation m: +380667740499 | f: vira.motorko https://www.facebook.com/vira.motorko | w: Ата https://meta.wikimedia.org/wiki/User:Ата
Are you saving your documents in free formats? ;) Help save natural resources – please think twice before printing this e-mail or any attachments.
2017-03-24 16:05 GMT+02:00 Trey Jones tjones@wikimedia.org:
It looks like a lot of the pieces needed to make this happen are out there.
Unfortunately it doesn't look like a one-to-one transliteration based on the description in English Wikipedia.[1] But when is language ever straightforward?
It looks like much of the work to deal with all the contextual variation and the exceptions to the transliteration was at least attempted twice. There's a zip file of code attached to the Phab Ticket,[2] and link to some code on-wiki[5]. From the comments, it looks like that code never quite worked, but it seems possible to harvest the conversion data from one or both and put it into the same format as the other existing language converters, like Kazakh[3]—and it *might* be easier this time since it's been 6.5 years and the LanguageConverter code is probably more mature now.
It would be even better if someone could create an Elasticsearch plugin to do the same kind of conversion. That would allow cross-alphabet searching, too. I've been working with a plugin[4] that does that kind of thing for Traditional and Simplified Chinese.
—Trey
[1] https://en.wikipedia.org/wiki/Crimean_Tatar_alphabet#Cyrillic_to_Latin_ transliteration [2] https://phabricator.wikimedia.org/T23582#247642 [3] https://doc.wikimedia.org/mediawiki-core/master/php/classKkConverter.html [4] https://github.com/medcl/elasticsearch-analysis-stconvert [5] https://phabricator.wikimedia.org/T23582#247634
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Fri, Mar 24, 2017 at 5:40 AM, Vira Motorko vira.motorko@gmail.com wrote:
[I'm sorry if it's not the place to ask, please forward where it should be.]
Hi all,
There is a long frozen idea: to make a transliterator for Crimean Tatar Wikipedia. Native speakers of crh use both cyrillic and latin script depending on the country they used to live in. One example of similar thing in use is https://kk.wikipedia.org — one
can
choose in what script they see the content.
There is an old task on Phabricator and were attempts to write a tool in php but the effort stopped. https://phabricator.wikimedia.org/T23582 https://phabricator.wikimedia.org/T23582
Maybe someone can/wants to help with this tool or create one from
scratch?
Maybe you know where else I can find help?
Thanks! *--* *Vira Motorko* project manager, Wikimedia Ukraine https://ua.wikimedia.org/
non-profit
organisation m: +380667740499 | f: vira.motorko <https://www.facebook.com/
vira.motorko>
| w: Ата https://meta.wikimedia.org/wiki/User:Ата
Are you saving your documents in free formats? ;) Help save natural resources – please think twice before printing this e-mail or any attachments. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I might work on this at the Hackathon (and maybe find Bunyk there if he's interested). I would need someone who knows Crimean Tatar to help validate the results, though that wouldn't have to happen during the Hackathon.
On Sun, Mar 26, 2017 at 1:23 AM, Vira Motorko vira.motorko@gmail.com wrote:
Thank you, Trey, for this analysis!
There is also this page with transliteration table (in case it helps and not confuses more)) https://crh.wikipedia.org/wiki/Vikipediya:%C4%B0ml%C3%A2/Latin_elifbesi
I asked user:Bunyk (Ukrainian Wikimedian) for help, but he is not working with php. He is attending Vienna Hakathon though so one can reach him there if feel like doing this.
I feel uncomfortable because I want this transliteration instrument to exist but can contribute to the code myself (( So please, my volunteer hero, appear! *--* *Vira Motorko* project manager, Wikimedia Ukraine https://ua.wikimedia.org/ non-profit organisation m: +380667740499 | f: vira.motorko https://www.facebook.com/vira.motorko | w: Ата https://meta.wikimedia.org/wiki/User:Ата
Are you saving your documents in free formats? ;) Help save natural resources – please think twice before printing this e-mail or any attachments.
2017-03-24 16:05 GMT+02:00 Trey Jones tjones@wikimedia.org:
It looks like a lot of the pieces needed to make this happen are out
there.
Unfortunately it doesn't look like a one-to-one transliteration based on the description in English Wikipedia.[1] But when is language ever straightforward?
It looks like much of the work to deal with all the contextual variation and the exceptions to the transliteration was at least attempted twice. There's a zip file of code attached to the Phab Ticket,[2] and link to
some
code on-wiki[5]. From the comments, it looks like that code never quite worked, but it seems possible to harvest the conversion data from one or both and put it into the same format as the other existing language converters, like Kazakh[3]—and it *might* be easier this time since it's been 6.5 years and the LanguageConverter code is probably more mature
now.
It would be even better if someone could create an Elasticsearch plugin
to
do the same kind of conversion. That would allow cross-alphabet
searching,
too. I've been working with a plugin[4] that does that kind of thing for Traditional and Simplified Chinese.
—Trey
[1] https://en.wikipedia.org/wiki/Crimean_Tatar_alphabet#Cyrillic_to_Latin_ transliteration [2] https://phabricator.wikimedia.org/T23582#247642 [3] https://doc.wikimedia.org/mediawiki-core/master/php/
classKkConverter.html
[4] https://github.com/medcl/elasticsearch-analysis-stconvert [5] https://phabricator.wikimedia.org/T23582#247634
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Fri, Mar 24, 2017 at 5:40 AM, Vira Motorko vira.motorko@gmail.com wrote:
[I'm sorry if it's not the place to ask, please forward where it should be.]
Hi all,
There is a long frozen idea: to make a transliterator for Crimean Tatar Wikipedia. Native speakers of crh use both cyrillic and latin script depending on the country they used to live in. One example of similar thing in use is https://kk.wikipedia.org — one
can
choose in what script they see the content.
There is an old task on Phabricator and were attempts to write a tool
in
php but the effort stopped. https://phabricator.wikimedia.org/T23582 https://phabricator.wikimedia.org/T23582
Maybe someone can/wants to help with this tool or create one from
scratch?
Maybe you know where else I can find help?
Thanks! *--* *Vira Motorko* project manager, Wikimedia Ukraine https://ua.wikimedia.org/
non-profit
organisation m: +380667740499 | f: vira.motorko <https://www.facebook.com/
vira.motorko>
| w: Ата https://meta.wikimedia.org/wiki/User:Ата
Are you saving your documents in free formats? ;) Help save natural resources – please think twice before printing this e-mail or any attachments. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org