Re: [Wikitech-l] Parsoid still doesn't love me

9 Nov 2015

On 11/09/2015 12:37 PM, Petr Bena wrote:
...
  Do you really want to say that reading from disk is
faster than
 processing the text using CPU? I don't know how complex syntax of mw
 actually is, but C++ compilers are probably much faster than parsoid,
 if that's true. And these are very slow.

 What takes so much CPU time in turning wikitext into html? Sounds like
 JS wasn't a best choice here. 
The problem is not turning wikitext into HTML, but turning it into HTML 
so that it can be turned back into wikitext when it is edited and doing 
it in such a way that you don't introduce dirty diffs.

That requires keeping around state, tracking things in wikitext closely, 
and doing a lot more analysis.

That means detecting markup errors, and retaining error recovery 
information so that you can account for it during analysis, and also so 
you can reintroduce the markup errors when you convert the html back to 
wikitext. This is the reason why we proposed 
https://phabricator.wikimedia.org/T48705 since we already have all the 
information about broken wikitext usage.

If you are interested in more details, either show up on 
#mediawiki-parsoid, or look at this april 2014 tech-talk: A preliminary 
look at Parsoid internals [ Slides 
<https://commons.wikimedia.org/wiki/File:Parsoid.techtalk.apr15.2014.pdf>, 
Video <https://www.youtube.com/watch?v=Eb5Ri0xqEzk> ]. It has some details.

So, TL:DR; is Parsoid is a *bi-directional* wikitext <-> html bridge and 
doing that is non-trivial.

Subbu.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Parsoid still doesn't love me