Re: [Pywikipedia-l] Using a true MediaWiki parser (mwparserfromhell) instead of textlib methods

14 Jul 2013


      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
One important question for me here is "How is the handling/behaviour
for malform(at)ed wiki syntax, like e.g. a text body:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
=== bad header 1 ==
A text containing mathematical equations like a = b + c or even
something that could look like a header, like a == b == c but is
anything BUT a header.
== bad header 2 ===
Some lorem ipsum bla bla ...
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
I am despirately seeking a parser that has the same error behaviour
and gives the same results like the original mw parser also in case of
malform(at)ed wiki text.
Is this the case for mwparserfromhell??
Thanks and Greetings
DrTrigon
On 24.04.2013 13:11, legoktm wrote:
...
Hi all, I had mentioned this in the rewrite roadmap, and noticed it
came up on IRC as well, so I'd like to run this by the mailing
list:
User:The Earwig has written a pure-python (with optional
C-speedups) MediaWiki text parser named mwparserfromhell[1].
Currently we have the textlib library and some various regexes that
implement this in a non-perfect way. From my experience using
mwparser (over 400k successful edits with no issues) I believe it
is ready to be bundled with the framework. I think it would still
be a good idea to keep textlib in as a fallback or for users who
are currently using it and don't need to migrate.
As for actually adding it, in the rewrite branch we can just add it
as a dependency in setup.py, and then convert various methods
over. In trunk, I'm guessing we would need to add it as an
external. (I'm not sure how that's actually done.)
[1] https://github.com/earwig/mwparserfromhell
-- Legoktm
_______________________________________________ Pywikipedia-l
mailing list Pywikipedia-l@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHigfUACgkQAXWvBxzBrDDGHgCfa+FV2kt8NnJMCg2gv8NGxbqU
txEAoL/TyTHKZIgLCOL52qiB0NyeVDbm
=GAQ4
-----END PGP SIGNATURE-----

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] Using a true MediaWiki parser (mwparserfromhell) instead of textlib methods