On Mon, Nov 9, 2015 at 1:37 PM, Petr Bena benapetr@gmail.com wrote:
Do you really want to say that reading from disk is faster than processing the text using CPU? I don't know how complex syntax of mw actually is, but C++ compilers are probably much faster than parsoid, if that's true. And these are very slow.
What takes so much CPU time in turning wikitext into html? Sounds like JS wasn't a best choice here.
More fundamentally, the parsing task involves recursive expansion of templates and image information queries, and popular wikipedia articles can involve hundreds of templates and image queries. Caching the result of parsing allows us to avoid repeating these nested queries, which are a major contributor to parse time.
One of the benefits of the Parsoid DOM representation[*] is that it will allow in-place update of templates and image information, so that updating pages after a change can be by simple substitution and *without* repeating the actual "parse wikitext" step. --scott [*] This actually requires some tweaks to the wikitext of some popular templates; https://phabricator.wikimedia.org/T114445 is a decent summary of the work (although be sure to read to the end of the comments, there's significant stuff there which I haven't editing into the top-level task description yet).