Perl implementation of preprocessToXml - Wikitech-l

4 Apr 2014


      Hi!
I am a perl programmer, and I am trying to make perl implementation of 
mediawiki format parser in order to use it in my projects.
I want this parser to work exactly in the same way as original php code, and 
since php and perl have a lot of common in syntax,  I decided to take original 
php code and change only the thing that should be changed.
So I've taken the body of preprocessToXml, implemented all set of the stack 
classes (in a way it sould be done in perl), and reimplemented some php 
functions that are used in the code (string operations and othes)
Things that can't be fixed by defining some function I've fixed right in the body 
of the code.
This seems to work (I've although skipped processing of html tags for now), 
it's in quite experimental state, it is not even packaged as a module yet, 
just ./test.pl with submodules. You can check it here if you want:
 https://github.com/dhyannataraj/perl-mediawiki-parser
Why do I write here?
1. Just let you know what I am doing. May be somebody is interested in the 
same thing
2. Ask you do you have some test cases for preparser only that checks all 
special cases like "====", "{{{text}}" or "{{unfinished", so I can 
automatically test that my preparset and original one gives the same result 
(I've looked in test/ in source code, these tests are much too complex than a 
preparser only tests) May be there is something that I've missed
3. I would have to keep the code up to date, so it mean that I will have to 
reimport it each major release. It is possible to make some regexps that will 
do most of the work, but in some cases code can be modified so it would be more 
perl-friendly. I.e. change comments from '//' into '#' change ===false into 
empty() (if it is equivalents). If you are willing to cooperate in this area, 
let me know. Then with the next reimport I would offer you a patch.
So, what do you think about it all?