-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Platonides Sent: 17 December 2008 00:20 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki
Jared Williams wrote:
SDCHing MediaWiki HTML would take some effort, as the page
output is
between skin classes and OutputPage etc.
Also would want the translation text from \languages\messages\Messages*.php in there too I think.
Handling the
$1 style placeholders is easy, its just determining what
message goes
through which wfMsg*() function, and if the WikiText
translations can be preconverted to html.
But most of the HTML comes from article wikitext, so I
wonder wether
it'd beat gzip by anything significant.
Jared
Note that SDCH is expected to be then gzipped, as they fulfill different needs. They aren't incompatible. You would use a dictionary for common skin bits, perhaps also adding some common page features, like the TOC code, 'amp;action=edit&redlink=1" class="new"'...
Having a second dictionary for language dependant output could be also interesting, but not all messages should be provided.
Unfortunately, whilst the useragent can announce it has multiple dictionaries, the SDCH response can only indicate it used a single dictionary.
Simetrical wrote:
What happens if you have parser functions that depend on
the value of
$1 (allowed in some messages AFAIK)? What if $1 contains wikitext itself (I wouldn't be surprised if that were true
somewhere)? How do
you plan to do this substitution anyway, JavaScript? What about clients that don't support JavaScript?
/Usually/, you don't create the dictionary output by hand, but pass the page to a "dictionary compresser" (or so is expected, this is too much experimental yet). If a parser function changed it completely, they will just be literals. If you have a parametrized block, the vcdiff would see, "this piece up to Foo matches this dictionary section, before $1. And this other matches the text following Foo..."
What I have atm, just traverses a directory of templates, using PHPs built in tokenizer to extract T_INLINE_HTML tokens into the dictionary (if greater than 3 bytes long), and replacing with them with a call to output the vcdiff copy opcodes.
So <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php $e($this->lang); ?>"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title><?php $e($this->title); ?>
Becomes <?php $this->copy(0, 53);$e($this->lang); $this->copy(53, 91);$e($this->title);
PHPs output buffering captures the output from the PHP code within the template, which essentially becomes the data section of the vcdiff.
Jared wrote:
I do have working PHP code, That can parse PHP templates & language strings to generate the dictionary, and a new set of templates rewritten to output the vcdiff efficiently.
Please share?
Intend too, I probably should document/add some comments first :)
Jared