Hi!
I am a perl programmer, and I am trying to make perl implementation of mediawiki format parser in order to use it in my projects.
I want this parser to work exactly in the same way as original php code, and since php and perl have a lot of common in syntax, I decided to take original php code and change only the thing that should be changed.
So I've taken the body of preprocessToXml, implemented all set of the stack classes (in a way it sould be done in perl), and reimplemented some php functions that are used in the code (string operations and othes)
Things that can't be fixed by defining some function I've fixed right in the body of the code.
This seems to work (I've although skipped processing of html tags for now), it's in quite experimental state, it is not even packaged as a module yet, just ./test.pl with submodules. You can check it here if you want: https://github.com/dhyannataraj/perl-mediawiki-parser
Why do I write here?
1. Just let you know what I am doing. May be somebody is interested in the same thing
2. Ask you do you have some test cases for preparser only that checks all special cases like "====", "{{{text}}" or "{{unfinished", so I can automatically test that my preparset and original one gives the same result (I've looked in test/ in source code, these tests are much too complex than a preparser only tests) May be there is something that I've missed
3. I would have to keep the code up to date, so it mean that I will have to reimport it each major release. It is possible to make some regexps that will do most of the work, but in some cases code can be modified so it would be more perl-friendly. I.e. change comments from '//' into '#' change ===false into empty() (if it is equivalents). If you are willing to cooperate in this area, let me know. Then with the next reimport I would offer you a patch.
So, what do you think about it all?
wikitech-l@lists.wikimedia.org