TL;DR: Parsoid isn't i18n friendly and uses English keywords instead of localized.[1] Is it a bug or feature? Please voice your opinion!
Longer version: For some funny reasons Parsoid is reading arrays from "right to left"[1], that is, it uses the LAST alias of the magic words rather than the first one[2]. One of the reasons for this is because in English the shorter "thumb" is preferred compared to the long "thumbnail". However, instead of fixing MessagesEn.php to define thumb as the first option, parsoid uses the last option. This choice result in all other wikis using the English alias (which appears last in magic words) rather than the localized one - so Parsoid isn't i18n friendly.
However there are different POVs regarding the correct solution for it: 1. Use English aliases in all projects - these are the most used aliases [and one of the reasons is people copying code from enwiki or using biased tools such as Parsoid] 2. Use localized aliases - keep the article content and syntax in the same language. This is especially important for non-latin languages with different alphabet. And there is a consensus for English being bad choice for RTL languages as it cause mixed directional content which should be avoided. So if we go with 1 choice, RTL languages should be exception.
I believe there is a cultural point of view here, and would like to hear what do you think (especially non RTL and non English speakers): Do you prefer mini (German), vignette (French), miniaturadeimagen (Spanish), мини (Russian) instead of thumb (for example)?
I did some dump-minning to get the usage statistics: https://phab.wmfusercontent.org/file/data/bskxfupspqo64dnnkdr7/PHID-FILE-v4r... And based on this I wrote a python script to suggest a reordering of the aliases by usage[3], so if choice 2 is selected, we can merge[2] and all languages will use the preferred choice.
[1] https://phabricator.wikimedia.org/T53852 [2] https://gerrit.wikimedia.org/r/#/c/244254/3/lib/wts.LinkHandler.js [3] https://gerrit.wikimedia.org/r/#/c/247914
Usage statistics link is broken. Correct one: https://phabricator.wikimedia.org/T116020#1738654
On Thu, Mar 31, 2016 at 10:42 PM, Eran Rosenthal eranroz89@gmail.com wrote:
TL;DR: Parsoid isn't i18n friendly and uses English keywords instead of localized.[1] Is it a bug or feature? Please voice your opinion!
Longer version: For some funny reasons Parsoid is reading arrays from "right to left"[1], that is, it uses the LAST alias of the magic words rather than the first one[2]. One of the reasons for this is because in English the shorter "thumb" is preferred compared to the long "thumbnail". However, instead of fixing MessagesEn.php to define thumb as the first option, parsoid uses the last option. This choice result in all other wikis using the English alias (which appears last in magic words) rather than the localized one - so Parsoid isn't i18n friendly.
However there are different POVs regarding the correct solution for it:
- Use English aliases in all projects - these are the most used aliases
[and one of the reasons is people copying code from enwiki or using biased tools such as Parsoid] 2. Use localized aliases - keep the article content and syntax in the same language. This is especially important for non-latin languages with different alphabet. And there is a consensus for English being bad choice for RTL languages as it cause mixed directional content which should be avoided. So if we go with 1 choice, RTL languages should be exception.
I believe there is a cultural point of view here, and would like to hear what do you think (especially non RTL and non English speakers): Do you prefer mini (German), vignette (French), miniaturadeimagen (Spanish), мини (Russian) instead of thumb (for example)?
I did some dump-minning to get the usage statistics:
https://phab.wmfusercontent.org/file/data/bskxfupspqo64dnnkdr7/PHID-FILE-v4r... And based on this I wrote a python script to suggest a reordering of the aliases by usage[3], so if choice 2 is selected, we can merge[2] and all languages will use the preferred choice.
[1] https://phabricator.wikimedia.org/T53852 [2] https://gerrit.wikimedia.org/r/#/c/244254/3/lib/wts.LinkHandler.js [3] https://gerrit.wikimedia.org/r/#/c/247914
And there is a consensus for English being bad choice for RTL languages as it cause mixed directional content which should be avoided. So if we go with 1 choice, RTL languages should be exception.
Let's get that patch to core +2'ed. Establishing a consistent preference order in core is clearly (IMNSHO) the right thing to do.
Siebrand, could you take a second look at https://gerrit.wikimedia.org/r/247914 ? --scott
On Thu, Mar 31, 2016 at 8:32 PM, Arlo Breault abreault@wikimedia.org wrote:
And there is a consensus for English being bad choice for RTL languages
as
it cause mixed directional content which should be avoided. So if we go with 1 choice, RTL languages should be exception.
See https://gerrit.wikimedia.org/r/#/c/280792/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm prepared to Be Bold and C+2 https://gerrit.wikimedia.org/r/247914. In my view, by establishing a consistent semantics (first alias preferred) this empowers the local wikis. If a particular wiki decides they prefer to use the English alias in markup, they just have to submit a patch to reorder the aliases for their particular language. This is much better than using some ad hoc mechanism where the choices are arbitrarily made by the Parsoid maintainers or whoever.
However, the patch needs to be rebased. (Sigh.) And the new semanticsordering should really be documented in the code or release notes somewhere, not just in the gerrit/git history.
Eran, if you can do these two things (and Siebrand doesn't scream "stop"), I'm happy to C+2. Arlo is already working on a patch on the Parsoid side which would switch Parsoid to using the first alias everywhere once the change to core is deployed. --scott
On Fri, Apr 1, 2016 at 2:50 PM, C. Scott Ananian cananian@wikimedia.org wrote:
Let's get that patch to core +2'ed. Establishing a consistent preference order in core is clearly (IMNSHO) the right thing to do.
Siebrand, could you take a second look at https://gerrit.wikimedia.org/r/247914 ? --scott
On Thu, Mar 31, 2016 at 8:32 PM, Arlo Breault abreault@wikimedia.org wrote:
And there is a consensus for English being bad choice for RTL languages
as
it cause mixed directional content which should be avoided. So if we go with 1 choice, RTL languages should be exception.
See https://gerrit.wikimedia.org/r/#/c/280792/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- (http://cscott.net)
Please give me some time. It's way past beer 'o clock at the Hackathon. Tomorrow's another day (JST - Jerusalem Standard Time).
-- Siebrand Mazeland Kitano ICT
M: +31 6 50 69 1239 Skype: siebrand
Op 1 apr. 2016 om 22:25 heeft C. Scott Ananian cananian@wikimedia.org het volgende geschreven:
I'm prepared to Be Bold and C+2 https://gerrit.wikimedia.org/r/247914. In my view, by establishing a consistent semantics (first alias preferred) this empowers the local wikis. If a particular wiki decides they prefer to use the English alias in markup, they just have to submit a patch to reorder the aliases for their particular language. This is much better than using some ad hoc mechanism where the choices are arbitrarily made by the Parsoid maintainers or whoever.
However, the patch needs to be rebased. (Sigh.) And the new semanticsordering should really be documented in the code or release notes somewhere, not just in the gerrit/git history.
Eran, if you can do these two things (and Siebrand doesn't scream "stop"), I'm happy to C+2. Arlo is already working on a patch on the Parsoid side which would switch Parsoid to using the first alias everywhere once the change to core is deployed. --scott
On Fri, Apr 1, 2016 at 2:50 PM, C. Scott Ananian cananian@wikimedia.org wrote:
Let's get that patch to core +2'ed. Establishing a consistent preference order in core is clearly (IMNSHO) the right thing to do.
Siebrand, could you take a second look at https://gerrit.wikimedia.org/r/247914 ? --scott
On Thu, Mar 31, 2016 at 8:32 PM, Arlo Breault abreault@wikimedia.org wrote:
And there is a consensus for English being bad choice for RTL languages
as
it cause mixed directional content which should be avoided. So if we go with 1 choice, RTL languages should be exception.
See https://gerrit.wikimedia.org/r/#/c/280792/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- (http://cscott.net)
-- (http://cscott.net) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Apr 1, 2016 at 3:31 PM, Siebrand Mazeland siebrand@kitano.nl wrote:
Please give me some time. It's way past beer 'o clock at the Hackathon. Tomorrow's another day (JST - Jerusalem Standard Time).
Sure, no worries. I'm waiting for Eran to rebase and document anyway. ;)
Enjoy your beer! --scott
I rebased https://gerrit.wikimedia.org/r/#/c/247914/ and added some documentation. It seems that jenkins got drunk :)
On Fri, Apr 1, 2016 at 10:34 PM, C. Scott Ananian cananian@wikimedia.org wrote:
On Fri, Apr 1, 2016 at 3:31 PM, Siebrand Mazeland siebrand@kitano.nl wrote:
Please give me some time. It's way past beer 'o clock at the Hackathon. Tomorrow's another day (JST - Jerusalem Standard Time).
Sure, no worries. I'm waiting for Eran to rebase and document anyway. ;)
Enjoy your beer! --scott
-- (http://cscott.net) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
<quote name="Eran Rosenthal" date="2016-04-02" time="02:02:28 +0300">
It seems that jenkins got drunk :)
A flaky unit test https://phabricator.wikimedia.org/T131549
Greg
On Sat, Apr 2, 2016 at 2:02 AM, Eran Rosenthal eranroz89@gmail.com wrote:
I rebased https://gerrit.wikimedia.org/r/#/c/247914/ and added some documentation. It seems that jenkins got drunk :)
I made an image macro/command for this, a while ago... say /jenkins https://i.imgur.com/zuGQrYX.png ;-)
wikitech-l@lists.wikimedia.org