Hello everybody,
Following the suggestion of Andre Klapper https://phabricator.wikimedia.org/T150933#2886290, I'm turning to this set of lists to see if it can attract more feedback on the topic of internationalized programming facilities within WM environment https://phabricator.wikimedia.org/T150933.
As described more extensively in the ticket,the idea is to implement internationalization facilities (if they don't exist yet) in compilers used in the WM infrastructure, and enable contributors to localize them (possibly through Translatewiki), and then let them use localized versions if they wish.
Please let me know if you need more details or if you have any question. You can answer on the list or on phabricator, as you wish.
Kind regards, Mathieu
Le 21/12/2016 à 10:09, mathieu stumpf guntz a écrit :
Hello everybody,
Following the suggestion of Andre Klapper https://phabricator.wikimedia.org/T150933#2886290, I'm turning to this set of lists to see if it can attract more feedback on the topic of internationalized programming facilities within WM environment https://phabricator.wikimedia.org/T150933.
As described more extensively in the ticket,the idea is to implement internationalization facilities (if they don't exist yet) in compilers used in the WM infrastructure, and enable contributors to localize them (possibly through Translatewiki), and then let them use localized versions if they wish.
Please let me know if you need more details or if you have any question. You can answer on the list or on phabricator, as you wish.
Hello,
The summary of the tasks is to provide some wrapper so one can code in its native language. The main examples would be the LUA language used for templating module on wiki and JavaScript for gadgets.
Thus a French developer could instead of writing LUA code:
local p = {}
function p.hello( frame ) return "Hello, world!" end
return p
Be able to write in its local language:
locale p = {}
fonction p.bonjour( cadre ) retourne "Bonjour à tous!" fin
retourne p
A task mentioned http://www.babylscript.com/ for which Mathieu is listed as the Esperanto maintainer. The system as a mapping of translation.
I like the idea, but that looks like a very huge effort which not really fit in Wikimedia movement goal (sharing knowledge). Though I have sympathy with the underlying idea of making software development accessible to non English speakers.
A few issues:
For both LUA Modules and Javascript gadgets, we would have to make the related MediaWiki components to have their whole API to be translated. So that in a Gadget instead of doing:
mw.loader.using(['mediawiki.util', 'mediawiki.notify']).then ( function () { function liveClock() { mw.util.addCSS( '#utcdate a { font-size: 120%; } } } $( liveClock ); } )
A french developer would do:
mw.chargeur.utilisant(['mediawiki.outil', 'mediawiki.annonce']).alors {
fonction () { fonction horloge() { mw.outil.ajoutFSC( '#dateTUC e { 'fontes-taille: 120%; } } } $( horloge() ); } )
(Example taken from mediawiki.org Gadget-UTCLiveClock.js)
You can see a few renames, some could be done, others are quite challenging:
* mediawiki.util get renamed to mediawiki.outil * then changed to alors * CSS translated to FSC which has absolutely no sense :D * the HTML element A (anchor) renamed to E (encre) * A CSS identifier that is changing from #utcdate to #dateTUC * The invalid CSS 'fontes-taille'
etc...
Again I appreciate the idea, but I don't think it is technically doable or worth pursuing. Not to mention that it will be very challenging to debug whenever some code has bug and the issue is referred to more knowledgable developers that happens to not know french or esperanto.
I sympathize with the goal but accessibility benefits would be far outweighed by maintaince costs. We regularly use grep to find code which is about to be deprecated; wikis copy gadgets from each other; more experienced developers are sometimes asked to help a wiki where the local maintainers have less experience. We have a whole global user group for people who go from wiki to wiki and fix things.
And that's not even taking into account the implementation costs, which would probably be massive if it includes stuff like localizing Lua/Javascript language keywords. The Wikimedia community collectively has a lot of experience with web development but not a whole lot of experience with programming language compiler development.
See also https://phabricator.wikimedia.org/T150417
On 21 December 2016 at 21:42, Gergo Tisza gtisza@wikimedia.org wrote:
I sympathize with the goal but accessibility benefits would be far outweighed by maintaince costs. We regularly use grep to find code which is about to be deprecated; wikis copy gadgets from each other; more experienced developers are sometimes asked to help a wiki where the local maintainers have less experience. We have a whole global user group for people who go from wiki to wiki and fix things.
And that's not even taking into account the implementation costs, which would probably be massive if it includes stuff like localizing Lua/Javascript language keywords. The Wikimedia community collectively has a lot of experience with web development but not a whole lot of experience with programming language compiler development. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 21/12/2016 à 22:42, Gergo Tisza a écrit :
I sympathize with the goal but accessibility benefits would be far outweighed by maintaince costs.
Maybe. Or maybe not. I can't judge very objectively without metrics, can I?
We regularly use grep to find code which is about to be deprecated;
Well, that's fine, and surely having localized versions of code would fall into your grep process, wouldn't it?
wikis copy gadgets from each other;
Yeah, sure that's a problem, and having a centralized gadget repository is a saner way to go that happen to be number 1 on 2016 Community Wishlist Survey/Results https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Results :)
more experienced developers are sometimes asked to help a wiki where the local maintainers have less experience. We have a whole global user group for people who go from wiki to wiki and fix things.
Well, I share your concerns, and I don't pretend that I have a perfect out of the box solution which make the best balance between technical maintainability, technical skill dissemination, linguistic diversity/accessibility and so on.
And that's not even taking into account the implementation costs, which would probably be massive if it includes stuff like localizing Lua/Javascript language keywords.
Well, that's why the thread is about internationalization, and not localization. Just like they are tools to translate all the Wikimedia non-executable hosted content. Or more generally, the way Wikimedia provide the technical facilities but not the human resources to make it so wide projects constructed on this infrastructure.
The Wikimedia community collectively has a lot of experience with web development but not a whole lot of experience with programming language compiler development.
Once again, I don't have metrics about that (but I admit that to my mind you statements seems relevant).
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 21 December 2016 at 16:42, Gergo Tisza gtisza@wikimedia.org wrote:
I sympathize with the goal but accessibility benefits would be far outweighed by maintaince costs.
I agree. This is an example of an absolutely excellent idea with a clearly defined goal that, when you work out the details of what would be needed to make it work, quickly becomes infeasible.
Dan
Hi Antoine, thank you for sharing your feedback and clarifying the whole topic.
Le 21/12/2016 à 15:13, Antoine Musso a écrit :
A few issues:
For both LUA Modules and Javascript gadgets, we would have to make the related MediaWiki components to have their whole API to be translated. So that in a Gadget instead of doing:
Well, API translation is a separated issue from programming language translation. And actually it's a rather far easier topic, especially regarding scripting languages which rely mainly on reference. Anyone can easily propose a Scribunto module which make relexicalised wrapping of "mw", right now, as long as it doesn't use non-ASCII characters. You could actually provide translations right into modules, with something like `mw.lang.fr.chargeur = mw.loader` to build on you bellow example. To my mind, the bigest issue with that can of approach is that the added reference layer can make the script harder to debug.
mw.chargeur.utilisant(['mediawiki.outil', 'mediawiki.annonce']).alors {
fonction () { fonction horloge() { mw.outil.ajoutFSC( '#dateTUC e { 'fontes-taille: 120%; } } } $( horloge() );
} )
(Example taken from mediawiki.org Gadget-UTCLiveClock.js)
You can see a few renames, some could be done, others are quite challenging:
- mediawiki.util get renamed to mediawiki.outil
- then changed to alors
- CSS translated to FSC which has absolutely no sense :D
- the HTML element A (anchor) renamed to E (encre)
- A CSS identifier that is changing from #utcdate to #dateTUC
- The invalid CSS 'fontes-taille'
As said the idea is to provide facilities for internationalization, not a localization of everything. Whether some community would like to go into a more or less deep localization process should be up to them. Moreover, to my mind, using the default keywords should always stay possible.
As for each of the above point :
* mediawiki.util get renamed to mediawiki.outil : there is no problem with that, isn't it? * then changed to alors : there is no problem with that, isn't it? * CSS translated to FSC which has absolutely no sense :D : well it is just as meaningful as CSS, whether you know the meaning of the acronym or not, isn't it? * the HTML element A (anchor) renamed to E (encre):
Actually, this is definitely not the way you would localize that. First an anchor is translated as "ancre" in French, "encre" translate "ink". But this really the most little problem with such way to localize. You should assume that your API users a) know how CSS identifier works and so you won't translate such a parameter b) don't know a thing about CSS identifier and provide an API which abstract that for them so that they never provide such a parameter.
* A CSS identifier that is changing from #utcdate to #dateTUC
The same apply here. And while, to my mind, it doesn't make sense to provide that kind of identifier translation facility, you would translate that with #dateUTC (see Temps universel coordonné https://fr.wikipedia.org/wiki/Temps_universel_coordonn%C3%A9).
* The invalid CSS 'fontes-taille'
Indeed, you should rather have something like mon_ancre = méthode_de_récupération_de_mon_ancre(); mon_ancre.taille_fonte_de_caractère("120%");
and your user shouldn't care whether this will produce HTML/CSS, postscript, pdf, or whatever digital layout format out there. :)
etc...
Again I appreciate the idea, but I don't think it is technically doable or worth pursuing. Not to mention that it will be very challenging to debug whenever some code has bug and the issue is referred to more knowledgable developers that happens to not know french or esperanto.
Well, I do agree that it makes debug potentially more difficult, but to my mind the problem is the added layer.
As for the achievability of such a project, it's definitely doable. You already pointed to Babylscript, and I yet have to advance on this project but you can already play with mallupa https://github.com/psychoslave/mallupa and lua-i18n https://github.com/psychoslave/lua-i18n (see the relexicalisation https://github.com/psychoslave/lua-i18n/tree/relexicalisation branch).
On Thu, Dec 22, 2016 at 10:42 AM mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
- mediawiki.util get renamed to mediawiki.outil : there is no problem
with that, isn't it?
- then changed to alors : there is no problem with that, isn't it?
Yes, that is a problem. As pointed out by others: it makes grepping for code when doing refactoring basically impossible as I'd have to know a list of every possible translation of "util." That's not a good use of developer time at all.
I mean these ideas have merit on the face of them, but I'm totally in the "this is nice but probably not worth the maintenance burden" camp along with Gergo.
T150417 was declined, and for good reasons I think.
I think ideas like Babylscript are more useful for a project where you've got a lot of developers in a *single* non-English language. For example, a team of Italians who would rather work in Italian than English. In our case, we've got *many* languages and being able to use things across wikis/languages is important.
-Chad
Le 22/12/2016 à 19:30, Chad a écrit :
On Thu, Dec 22, 2016 at 10:42 AM mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
- mediawiki.util get renamed to mediawiki.outil : there is no problem
with that, isn't it?
- then changed to alors : there is no problem with that, isn't it?
Yes, that is a problem. As pointed out by others: it makes grepping for code when doing refactoring basically impossible as I'd have to know a list of every possible translation of "util." That's not a good use of developer time at all.
Well, then I would say that it's more like a projection of a useful tool for a given situation on an other situation, however similar, where it is no longer that useful. What you want isn't find the lexem (or even any accidentally matching string), but the seme (or semene). In most programming languages there is probably a strong enough correlation between lexem and seme(ne) is strong enough to make a simple regular expression matching useful in many cases. But even their, a grep approach can quickly show its limits (especially with false positive in my experience). Having a tool which let you perform transformations based on an AST is often far more accurate and flexible.
I mean these ideas have merit on the face of them, but I'm totally in the "this is nice but probably not worth the maintenance burden" camp along with Gergo.
Well, I do understand the argumentation, and it does sounds reasonable to me. It's more like I wouldn't place the cursor of maintenance burden tolerance at the same point, especially when I feel it might have large impact on (language) diversity.
Also, my understanding is that the main concern here is about letting developers with advanced skills easily go and help in misc. wiki. And as I understand it, this shouldn't be such a big deal with a central repository where most common problematic are (hopefully) already factored in a way which ease maintenance. To my mind, it's compatible with having more local specific problem solved, and possibly some local wrapper of centralized code, in a way which please the local community (which might just as well prefer not to use code localization after all).
T150417 was declined, and for good reasons I think.
Well, as said, I'm not in fundamental disagreement with that.
I think ideas like Babylscript are more useful for a project where you've got a lot of developers in a *single* non-English language. For example, a team of Italians who would rather work in Italian than English. In our case, we've got *many* languages and being able to use things across wikis/languages is important.
To my mind, having a lot of developers with many languages doesn't exclude that we also have developers in a single non-English language as part of the former, does it?
-Chad _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I think it is easier to create a new localizable programming environment from scratch, than to try to localize existing tools and APIs.
For example, block-based programming languages (like Scratch and eToys) tend to be fairly easy to translate -- there are some issues regarding fixed size labels, the meaning of concatenating labels in various langauges, etc, but these programs proved surmountable. We used these extensively at One Laptop per Child.
At OLPC I worked on a block-based JavaScript-subset, which allowed complete translation between "text based" and "block based" views of the code: http://turtlescript.github.cscott.net/ You could localize the labels in the block-based version and still "compile down" to the legacy/English APIs.
As some folks know I've been a persistent advocate for JavaScript in Scribunto, and I've contributed to v8js to this end.
Another option is to move away from textual scripting languages on wiki entirely. For example, https://phabricator.wikimedia.org/T114454 proposes using Visual Editor to perform more of the "template" tasks, with a stronger separation of code, "layout", and data. Template edits which affect the visual presentation or the data ("arguments") but not the underlying code should be possible in Visual Editor in a fully localized manner. The textual "code" portion would be de-emphasized for most tasks. For tasks which do require "code" to be written, you'd use one of the techniques described above. --scott
wikitech-l@lists.wikimedia.org