On 07/23/2015 01:07 PM, Ricordisamoa wrote:
Il 23/07/2015 15:28, Antoine Musso ha scritto:
Le 23/07/2015 08:15, Ricordisamoa a écrit :
Are there any stable APIs for an application to
get a parse tree in
machine-readable format, manipulate it and send the result back
without
touching HTML?
I'm sorry if this question doesn't make any sense.
You might want to
explain what you are trying to do and which wall you
have hit when attempting to use Parsoid :-)
For example, adding a template transclusion as new parameter in
another template.
XHTML5+RDFa is the wall :-(
Can't Parsoid's deserialization be caught at some point to get a
higher-level structure like mwparserfromhell
<https://github.com/earwig/mwparserfromhell>'s?
Parsoid and mwparserfromhell have different design goals and hence do
things differently.
Parsoid is meant to support HTML editing and hence provides semantic
information as annotations over the HTML document. It effectively
maintains a bidirectional/reversible mapping between segments of
wikitext and DOM trees. You can manipulate the DOM trees and get back
wikitext that represents the edited tree. As for useless information
and duplicate information -- I think if you looked at the Parsoid DOM
spec [1], you will know what to look for and what to manipulate. The
information on the DOM is meant to (a) render accurately (b) support
the various bots / clients / gadgets that look for specific kinds of
information, and (b) be editable easily. If that spec has holes or
needs updates or fixing, we are happy to do that. Do let us know.
mwparserfromhell is an entirely wikitext-centric library as far as I
can tell. It is meant to manipulate wikitext directly. It is a neat
library which provides a lot of utilities and makes it easy to do
wikitext transformations. It doesn't know about or care about HTML
because it doesn't need to. It also seems to effectively gives you
some kind of wikitext-centric AST. These are all impressions based on
a quick scan of its docs -- so pardon any misunderstandings.
Parsoid does not provide you a wikitext AST directly since it doesn't
construct one. All wikitext information shows up indirectly as DOM
annotations (either attributes or JSON information in attributes). As
Scott showed, you can still do document ("wikitext") manipulations
using DOM libraries, CSS-style queries, or directly by walking the
DOM. There are lots of ways you can edit mediawiki pages without
knowing about wikitext and using the vast array of HTML libraries.
That happens to be our tagline: "we deal with wikitext so you don't
have to".
But, you are right. It can indeed seem cumbersome if you want to
directly manipulate wikitext without the DOM getting in between or
having to deal with DOM libraries. But that is not the use case we
target. There are a vastly greater number of libraries in all kinds of
languages (and developers) that know about HTML and can render,
handle, and manipulate HTML easily than know how to (or want to)
manipulate wikitext programmatically. Kind of the difference between
the wikitext editor and the visual editor. They each have their
constituencies and roles.
All that said, as Scott noted, it is possible to develop a
mwparserfromhell like layer on top of the Parsoid DOM annotations if
you want a wikitext-centric view (as opposed to a DOM-centric view
that most editing clients seem to want). But, since that is not a use
case that we target, that hasn't been on our radar. If someone does
want to take that on, and thinks it would be useful, we are happy to
provide assistance. It should not be too difficult.
Does that help summarize this issue and clarify the differences and
approaches of these two tools? I am "on vacation" :-) so responses
will be delayed.
Subbu.
[1]
http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec
Hi Subbu,
thank you for this thoughtful insight.
HTML is not a barrier by itself. The problem seems to be Parsoid being
built primarily with VisualEditor in mind. It is not clear to me how can
a single DOM serving both view and edit modes avoid redundancy.
I see huge demand for alternative wikignome-style editors. The more
Parsoid's DOM is predictable, concise and documented, the more users you
get. I hope we can meet in the middle :-)