On 11/09/2015 12:37 PM, Petr Bena wrote:
Do you really want to say that reading from disk is
faster than
processing the text using CPU? I don't know how complex syntax of mw
actually is, but C++ compilers are probably much faster than parsoid,
if that's true. And these are very slow.
What takes so much CPU time in turning wikitext into html? Sounds like
JS wasn't a best choice here.
The problem is not turning wikitext into HTML, but turning it into HTML
so that it can be turned back into wikitext when it is edited and doing
it in such a way that you don't introduce dirty diffs.
That requires keeping around state, tracking things in wikitext closely,
and doing a lot more analysis.
That means detecting markup errors, and retaining error recovery
information so that you can account for it during analysis, and also so
you can reintroduce the markup errors when you convert the html back to
wikitext. This is the reason why we proposed
https://phabricator.wikimedia.org/T48705 since we already have all the
information about broken wikitext usage.
If you are interested in more details, either show up on
#mediawiki-parsoid, or look at this april 2014 tech-talk: A preliminary
look at Parsoid internals [ Slides
<https://commons.wikimedia.org/wiki/File:Parsoid.techtalk.apr15.2014.pdf>,
Video <https://www.youtube.com/watch?v=Eb5Ri0xqEzk> ]. It has some details.
So, TL:DR; is Parsoid is a *bi-directional* wikitext <-> html bridge and
doing that is non-trivial.
Subbu.