Hi Merlijn
Great, I will do so tonight. I have to say that I it is *not* attempt
to write a complete parser for wikitext but rather have a solution to
a some very limited problem which I encountered. This means that I can
find templates and parse them into key-value pairs and there is also
some code that can parse Image/File tags. However it is not a complete
parser and for example it does not parse headings as DrTrigon asked,
it is mostly doing templates at the moment. Also there is currently no
support for unnamed parameters.
However it might be a starting point for further work. I also did not
find formal specifications for wikitext so it was a lot of learning by
doing. However I used it successfully on ~4k "Infobox Chemie"
templates in the de-wiki.
Hannes
On 24 January 2012 09:55, Dr. Trigon <dr.trigon(a)surfeu.ch> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello Hannes
Just wondering; is your text parser able to correctly find all headings
(e.g. '== bla ==' as well as '<h2>bla</h2>') and distinguish
headings
from other similar text but within a paragraph? And finally return the
byte offset of those headings?
I am using such a piece of code written with help of difflib and it is
may be useful here also? (even though I had not that much time to write
a unittest with full coverage... but a simple one is there ;)
Greetings
DrTrigon
On 23.01.2012 23:34, Hannes Röst wrote:
Hello all
From one of my assignments as a bot operator I
have some code
which
does template parsing and general text parsing (e.g. Image/File
tags). It is not using regex and thus able to correctly parse
nested templates and other such nasty things. I have written those
as library classes and written tests for them which cover almost
all of the code. I would now really like to contribute that code
back to the community.
Would you be interested in adding this code to the pywikibot
framework? If yes, can I send the code to someone for code review
or how do you usually operate?
Greetings
Hannes
PS: wiki userpage is
http://en.wikipedia.org/wiki/User:Hannes_R%C3%B6st
_______________________________________________ Pywikipedia-l
mailing list Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org/
iEYEARECAAYFAk8eceUACgkQAXWvBxzBrDBmJQCePmfUbs4Y8HNN18UT6vMFYo5r
N1AAoLuN1VLpZQOrwegmkKWl08Te0Rxp
=HXai
-----END PGP SIGNATURE-----
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l