I've been trying to figure out if it's possible to slightly change how message strings are processed, for the particular case of sending them to a Javascript-heavy application.
The WMF has been working on a lot of interface improvements, and these things necessarily lean heavily on Javascript.
There are many cases where we want to do some late message parsing on the client, like
'[$1 Click here to advance to the next item]' 'You have uploaded $1 {{PLURAL:$1|file|files}}'
However there is a lot of other kinds of parsing that could and should be done on the server, like
'From {{SITENAME}}' '[[Special:SpecialPages|{{int:specialpages}}]]'
So we have two options, as far as I can tell:
OPTION 1:
Include a wikitext parser in Javascript, that also knows how to fetch int:strings and other such values when it needs them. This is undesirable for obvious reasons.
Michael Dale actually did a lot of this already for his JS2 project, and his solution works. But I'm taking another look at the problem.
OPTION 2:
Figure out some way to alter parsing on the server for Javascript messages, such that things like {{SITENAME}} are parsed, but {{PLURAL}} isn't (or, the parse results emits an identical {{PLURAL}} template call).
Then a much simpler parser -- probably based on regular expressions -- can be deployed to the client.
Roan Kattouw and Trevor Parscal are working on a new Resource Loader that may be able to do some of the above. Also, Niklaus Laxstrom is working on a new Message class that looks like it will be far more amenable to this sort of thing.
In the meantime...
I've been messing around with the current system, and it seems that there isn't any good way to do this that doesn't involve evil dabbling with global parser hooks. I tried creating a special Language-like object that I could pass into message parsing, which essentially wrapped all calls to another Language object except for convertPlural and mCode. But it seems that this won't work either since so many of the ParserOptions are retrieved from global settings in strange ways.
Any thoughts?
On 08/10/2010 07:01 PM, Neil Kandalgaonkar wrote:
OPTION 2:
Figure out some way to alter parsing on the server for Javascript messages, such that things like {{SITENAME}} are parsed, but {{PLURAL}} isn't (or, the parse results emits an identical {{PLURAL}} template call).
Then a much simpler parser -- probably based on regular expressions -- can be deployed to the client.
Roan Kattouw and Trevor Parscal are working on a new Resource Loader that may be able to do some of the above. Also, Niklaus Laxstrom is working on a new Message class that looks like it will be far more amenable to this sort of thing.
One potential problems with simple regex is msg key values like: 'category-subcat-count' => '{{PLURAL:$2|This category has only the following subcategory.|This category has the following {{PLURAL:$1|subcategory|$1 subcategories}}, out of $2 total.}}
If you can do a regEx that will work for such substitution that would be great, or the server could do some transform that represented the transforms in JSON so that the template parsing could be avoided .. ( If thats the goal ) then it could relatively easily be done that way. The JSON representation the server creates ends up looking a lot like the intermediate object the client side template parser creates ( more verbose than the wikitext )... or ... we can just do the template parsing in javascript.
I went with the client side template parsing route with the idea that we will eventually find other uses for client side wikitext template parser, and we could refine and built a relatively robust components for handling wikitext on the client. While its not fun to maintain "two parsers" its also not fun to maintain two data representation formats. i.e I don't know how well suited the existing php based parser "infastructure" is for intermediate JSON representation, and we will have to matain a JSON transform library in php and associated JSON -> html / interface(s) & JSON-> wikitext in javascript. Depending on the type of applications built handling some of the wikitext parsing in javascript could be more server side resource friendly.
I liked the idea that we briefly discussed in person that javascript modules can include an additional mw.wiktextLangauge module in their dependency list if they use message template substitution, while the base javascript library just includes vanilla substitution.
The mw.parser would be a required dependency of the wiktextLangauge module.
The basic rule for server side template expansion vs client side parsing is a variable argument. If you have {{PLURAL:{{NUMBEROFEDITS}}|edit|edits}} there is no reason to not swap in "edits" on the server for that message. Of course managing expiration of magic substation is already 'no fun' with use in wikitext so people should be discouraged from going crazy with these things inside message values.
In general direct substitutions like {{SITENAME}} should move towards normal $1 substitution with a uniform way to package php variables into javascript. What presently is
<script type="text/javascript">/*<![CDATA[*/ var wgArticlePath = "/wiki/$1"; var wgScriptPath = "/w"; .. etc
Should instead be a set of variables exported via the resource loader for specific modules. Ie MakeGlobalVariablesScript hook should be replaced with something a bit more intelligent that allows for the resource loader manifests to specify what variables it needs.
Ideally these variables get packaged via mw.setConfig / mw.getConfig rather than plopped into the global namespace.
Since the resource loader will be pre-processing all the javascript there (eventually ) will be no reason to define all these variables globally which presently makes it impossible to send loggin users the same cached page as anymous users. ( ie user specif variables should be a different url than site / page specif exported variables ) This also lets you change configuration without purging the javascript code cache if thats a desirable feature.
In the dynamic loading context its a bit more tricky to not pollute the cache, but the dynamic loading resource loader code could just issue a separate parallel request for such associated JavaScript resource packaged configuration / exported variables where needed. Possibly packaged with the messages if it turns out messages / configuration change more frequently then source code and you don't want one mangling the cache of the other.
--michael
On 11 August 2010 11:39, Michael Dale mdale@wikimedia.org wrote:
In general direct substitutions like {{SITENAME}} should move towards normal $1 substitution with a uniform way to package php variables into javascript.
Why? We can and should just replace those in the server side.
On 08/11/2010 06:22 AM, Niklas Laxström wrote:
On 11 August 2010 11:39, Michael Dalemdale@wikimedia.org wrote:
In general direct substitutions like {{SITENAME}} should move towards normal $1 substitution with a uniform way to package php variables into javascript.
Why? We can and should just replace those in the server side.
Because we should avoid lots of wikitext in messages, and variable substitution gives us better cache expire management. {{SITENAME}} is not such a good example since its pretty static .. but in general its my understanding that lots of wikitext template calls in interface msgs should be avoided if possible.
--michael
There are a lot of apparent benefits to doing it all in Javascript, but it still bothers me because it's a huge violation of Don't Repeat Yourself.
You're probably right that we will want a wikitext parser on the client someday, especially for the editor. So perhaps I'm struggling for no good reason.
But, duplicating all the parser functionality *and* transferring the entire state of the MediaWiki server to the client still seems like the wrong thing.
I have some more radical ideas like making the parser emit JS closures when appropriate, but those are probably even more unrealistic until Niklaus' new Message class lands (and maybe not even then).
Anyway, after denting my head on my desk a few times I'm starting to think that wfMsg* is just unworkable for this idea.
So unless anyone else has any bright ideas I'll just give up and go with Michael Dale's approach of sending a parser along, at least for now.
On 8/11/10 1:39 AM, Michael Dale wrote:
On 08/10/2010 07:01 PM, Neil Kandalgaonkar wrote:
OPTION 2:
Figure out some way to alter parsing on the server for Javascript messages, such that things like {{SITENAME}} are parsed, but {{PLURAL}} isn't (or, the parse results emits an identical {{PLURAL}} template call).
Then a much simpler parser -- probably based on regular expressions -- can be deployed to the client.
Roan Kattouw and Trevor Parscal are working on a new Resource Loader that may be able to do some of the above. Also, Niklaus Laxstrom is working on a new Message class that looks like it will be far more amenable to this sort of thing.
One potential problems with simple regex is msg key values like: 'category-subcat-count' => '{{PLURAL:$2|This category has only the following subcategory.|This category has the following {{PLURAL:$1|subcategory|$1 subcategories}}, out of $2 total.}}
If you can do a regEx that will work for such substitution that would be great, or the server could do some transform that represented the transforms in JSON so that the template parsing could be avoided .. ( If thats the goal ) then it could relatively easily be done that way. The JSON representation the server creates ends up looking a lot like the intermediate object the client side template parser creates ( more verbose than the wikitext )... or ... we can just do the template parsing in javascript.
I went with the client side template parsing route with the idea that we will eventually find other uses for client side wikitext template parser, and we could refine and built a relatively robust components for handling wikitext on the client. While its not fun to maintain "two parsers" its also not fun to maintain two data representation formats. i.e I don't know how well suited the existing php based parser "infastructure" is for intermediate JSON representation, and we will have to matain a JSON transform library in php and associated JSON -> html / interface(s) & JSON-> wikitext in javascript. Depending on the type of applications built handling some of the wikitext parsing in javascript could be more server side resource friendly.
I liked the idea that we briefly discussed in person that javascript modules can include an additional mw.wiktextLangauge module in their dependency list if they use message template substitution, while the base javascript library just includes vanilla substitution.
The mw.parser would be a required dependency of the wiktextLangauge module.
The basic rule for server side template expansion vs client side parsing is a variable argument. If you have {{PLURAL:{{NUMBEROFEDITS}}|edit|edits}} there is no reason to not swap in "edits" on the server for that message. Of course managing expiration of magic substation is already 'no fun' with use in wikitext so people should be discouraged from going crazy with these things inside message values.
In general direct substitutions like {{SITENAME}} should move towards normal $1 substitution with a uniform way to package php variables into javascript. What presently is
<script type="text/javascript">/*<![CDATA[*/ var wgArticlePath = "/wiki/$1"; var wgScriptPath = "/w"; .. etc Should instead be a set of variables exported via the resource loader for specific modules. Ie MakeGlobalVariablesScript hook should be replaced with something a bit more intelligent that allows for the resource loader manifests to specify what variables it needs. Ideally these variables get packaged via mw.setConfig / mw.getConfig rather than plopped into the global namespace. Since the resource loader will be pre-processing all the javascript there (eventually ) will be no reason to define all these variables globally which presently makes it impossible to send loggin users the same cached page as anymous users. ( ie user specif variables should be a different url than site / page specif exported variables ) This also lets you change configuration without purging the javascript code cache if thats a desirable feature. In the dynamic loading context its a bit more tricky to not pollute the cache, but the dynamic loading resource loader code could just issue a separate parallel request for such associated JavaScript resource packaged configuration / exported variables where needed. Possibly packaged with the messages if it turns out messages / configuration change more frequently then source code and you don't want one mangling the cache of the other. --michael
it seems that this won't work either since so many of the ParserOptions are retrieved from global settings in strange ways.
Strange? They are loaded from either initialiseFromUser() or one of its setters...
If you can do a regEx that will work for such substitution that would be great, or the server could do some transform that represented the transforms in JSON so that the template parsing could be avoided ..
I remember some time ago we discussed in wikitech-l the possibility of doing a late replace of plural: and {{sitename}} to have the rest of the message parsed in a cache.
It may be worth to bring back that idea if such light parser is going to be done in JS too.
For transforming the templates into a JSON tree, the DOM preprocessor may be easy to transform.
On 08/11/2010 02:51 PM, Platonides wrote:
I remember some time ago we discussed in wikitech-l the possibility of doing a late replace of plural: and {{sitename}} to have the rest of the message parsed in a cache.
It may be worth to bring back that idea if such light parser is going to be done in JS too.
For transforming the templates into a JSON tree, the DOM preprocessor may be easy to transform.
I think the we should do server side msg parse and cache for everything it can, with variable substitution arguments like {{PLURAL:$1|value 1| value 2}} being delivered to the client in a way that it can manage relevant in-line substitutions. I am open to that template being represented as a wikitext string or as a JSON tree. I lean towrds wikitext string since it will be helpful for javascript to be able to deal with 'light' wikitext strings.
--michael
On 12 August 2010 10:12, Michael Dale mdale@wikimedia.org wrote:
I think the we should do server side msg parse and cache for everything it can, with variable substitution arguments like {{PLURAL:$1|value 1| value 2}} being delivered to the client in a way that it can manage relevant in-line substitutions.
I might be stating the obvious, but please do not forget that {{PLURAL}} handling (and number of its parameters) is language-dependent. The client-side JavaScript code handling this would have to be defined in LanguageXx.php.
-- [[cs:User:Mormegil | Petr Kadlec]]
On 8/12/10 4:30 AM, Petr Kadlec wrote:
I might be stating the obvious, but please do not forget that {{PLURAL}} handling (and number of its parameters) is language-dependent. The client-side JavaScript code handling this would have to be defined in LanguageXx.php.
Michael created partial analogs of each LanguageXx.php as LanguageXx.js.
http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/JS2Support/mwEmb...
mediawiki-i18n@lists.wikimedia.org