Hi Merlijn
Great, I will do so tonight. I have to say that I it is *not* attempt to write a complete parser for wikitext but rather have a solution to a some very limited problem which I encountered. This means that I can find templates and parse them into key-value pairs and there is also some code that can parse Image/File tags. However it is not a complete parser and for example it does not parse headings as DrTrigon asked, it is mostly doing templates at the moment. Also there is currently no support for unnamed parameters.
However it might be a starting point for further work. I also did not find formal specifications for wikitext so it was a lot of learning by doing. However I used it successfully on ~4k "Infobox Chemie" templates in the de-wiki.
Hannes
On 24 January 2012 09:55, Dr. Trigon dr.trigon@surfeu.ch wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello Hannes
Just wondering; is your text parser able to correctly find all headings (e.g. '== bla ==' as well as '<h2>bla</h2>') and distinguish headings from other similar text but within a paragraph? And finally return the byte offset of those headings?
I am using such a piece of code written with help of difflib and it is may be useful here also? (even though I had not that much time to write a unittest with full coverage... but a simple one is there ;)
Greetings DrTrigon
On 23.01.2012 23:34, Hannes Röst wrote:
Hello all
From one of my assignments as a bot operator I have some code which
does template parsing and general text parsing (e.g. Image/File tags). It is not using regex and thus able to correctly parse nested templates and other such nasty things. I have written those as library classes and written tests for them which cover almost all of the code. I would now really like to contribute that code back to the community.
Would you be interested in adding this code to the pywikibot framework? If yes, can I send the code to someone for code review or how do you usually operate?
Greetings
Hannes
PS: wiki userpage is http://en.wikipedia.org/wiki/User:Hannes_R%C3%B6st
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8eceUACgkQAXWvBxzBrDBmJQCePmfUbs4Y8HNN18UT6vMFYo5r N1AAoLuN1VLpZQOrwegmkKWl08Te0Rxp =HXai -----END PGP SIGNATURE-----
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l