On Mon, Nov 9, 2015 at 1:37 PM, Petr Bena <benapetr(a)gmail.com> wrote:
Do you really want to say that reading from disk is
faster than
processing the text using CPU? I don't know how complex syntax of mw
actually is, but C++ compilers are probably much faster than parsoid,
if that's true. And these are very slow.
What takes so much CPU time in turning wikitext into html? Sounds like
JS wasn't a best choice here.
More fundamentally, the parsing task involves recursive expansion of
templates and image information queries, and popular wikipedia articles can
involve hundreds of templates and image queries. Caching the result of
parsing allows us to avoid repeating these nested queries, which are a
major contributor to parse time.
One of the benefits of the Parsoid DOM representation[*] is that it will
allow in-place update of templates and image information, so that updating
pages after a change can be by simple substitution and *without* repeating
the actual "parse wikitext" step.
--scott
[*] This actually requires some tweaks to the wikitext of some popular
templates;
https://phabricator.wikimedia.org/T114445 is a decent summary of
the work (although be sure to read to the end of the comments, there's
significant stuff there which I haven't editing into the top-level task
description yet).
--
(
http://cscott.net)