Re: [Wikitech-l] RFC: Parsoid roadmap

30 Jan 2013


      On 01/30/2013 12:36 AM, Ariel T. Glenn wrote:
...
Στις 23-01-2013, ημέρα Τετ, και ώρα 15:10 -0800, ο/η Gabriel Wicke
έγραψε:
...
Fellow MediaWiki hackers!
After the pretty successful December release and some more clean-up work
following up on that we are now considering the next steps for Parsoid.
To this end, we have put together a rough roadmap for the Parsoid project at
https://www.mediawiki.org/wiki/Parsoid/Roadmap
On thing that jumped out at me is this:
"We have also decided to narrow our focus a bit by continuing to use the
PHP preprocessor to perform our template expansion."
While I understand the reasoning and even sympathize with it, I had
hoped that Parsoid, when complete, would facilitate the implementation
of wikitext parsers apart from the canonical parser (i.e. MediaWiki),
with clearly defined behavior for the language including templates.  Is
that idea dead then?
As it exists, Parsoid can tackle full template expansion -- but, since 
it does not support all parser functions natively, this is still 
incomplete, and we can bypass the need for the most part by relying on 
the PHP preprocessor to give us fully expanded wikitext which we process 
further.
We are refocusing our efforts towards exploring HTML-based templating -- 
while supporting existing templates.  Lua based templates already clean 
up a lot of template logic by have access to full conditional logic.   
By relying more on DOM-based templates (which would also be editable in 
a visual-editor like client), the expectation is that direct wikitext 
use itself will progressively diminish.  Since most wikitext and 
probably Lua templates already return well-formed DOM (not all do), by 
simply adding a parse layer on top of them, they can be supported in a 
DOM-only templating framework.  So, the first outcome of this effort 
would be to require templates to return DOM fragment always.
In such a diminished-use scenario, we do not see the need for focusing a 
lot of energy and effort in attaining full compatibility entirely in 
Parsoid.  We see Parsoid+PHP parser as providing legacy wikitext support 
while a large chunk of editing and storage happens in the HTML world.  
We can then take it from there based on how far this strategy takes us.  
If there still remains a need for a full replacement wikitext evaluation 
system to be in place (because of continuing popularity of wikitext or 
because of performance reasons or whatever else), that option remains 
open and is not closed at this time.
Even so, there is still possibility of identifying "erroneous" or 
"undefined behavior" wikitext markup within Parsoid (in quotes, because 
anything that is thrown at the php parser and parsoid needs to be 
rendered always).  We can detect, for example, missing opening/closing 
html tags (since we currently have to do that for roundtripping them 
properly without introducing dirty diffs), detect unbalanced tags in 
certain contexts by treating them as balanced-DOM contexts (image 
captions, extensions), and other such scenarios.  We also have been 
adding a number of parser tests that try to specify edge case behaviors 
and make a call as to whether it is legitimate behavior or undefined 
behavior.  All of this could be used in some mode to issue warnings in 
some lint-like mode, which can then serve to be a de-facto definition of 
legitimate wikitext since there is no possibility of a grammar-based 
definition for wikitext.
So, while we are not focusing on attaining full replacement capability 
in Parsoid, our new directions do not entirely do away with the idea 
that you alluded to: (1) we are attempting to move towards templates 
(DOM/Lua/wikitext) that can only return DOM fragments (2) we retain the 
ability to provide some kind of linting ability in Parsoid (but this 
functionality is not at the top of our todo list since we are focused on 
reducing the scope of wikitext use over the long-term, while providing 
full compatibility in the immediate and short-term).
Does that answer your question?
Subbu.
PS: The other primary reason for going with a new wikitext 
evaluator/runtime (more accurate than calling this a parser), possibly 
in c++, was performance -- but we are going at it in a different way 
already based on the notion that most edits on wiki pages are going to 
be "minor" edits (relative to the size of the page).  If so, there is no 
sense in fully serializing it and fully reparsing it on every such minor 
edit -- it is a waste of server resources.  Since we now have a fully 
RT-able HTML representation of wikitext, selective serialization 
(HTML->wikitext) selective reparsing (of wikitext-based edits that 
happen outside the VE), along with caching of DOM fragments 
(transclusions, etc) should take care of the performance issue -- these 
are addressed in the RFC.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] RFC: Parsoid roadmap