Dear all
So I uploaded the code here: https://sourceforge.net/tracker/?func=detail&aid=3479070&group_id=93...
The following test might best describe what the code is doing, working on a nested template it is possible to retrieve the inner as well as the outer template as a dictionary of key-value pair:
def test_nested_template(self): nested_template = u"""
Cras suscipit lorem eget elit pulvinar et molestie magna tempus. Vestibulum. {{Toplevel template | key1 = value1 | key2 = [[File:Al3+.svg|40px|Aluminiumion]] {{nested template | nested_key1 = nested_value1 | nested_key2 = nested_value2 }} | key3 = value3 }} and more text """ # First fetch the outer template and assert that we get key1 through 3 template = templateparser.parse_template(nested_template, 'Toplevel template') expected = u'[[File:Al3+.svg|40px|Aluminiumion]] {{nested template \n | nested_key1 = nested_value1\n | nested_key2 = nested_value2\n }} ' self.assertEqual( len(template.parameters.keys()), 3) self.assertEqual( template.parameters['key1'], 'value1' ) self.assertEqual( template.parameters['key3'], 'value3' ) self.assertEqual( template.parameters['key2'], expected) self.assertEqual( template.start, 111 ) self.assertEqual( template.end, 401 ) self.assertFalse(template.parameters.has_key('nested_key1')) # # Now fetch the inner (nested) template and assert that we get nested_key 1 and 2 template = templateparser.parse_template(nested_template, 'nested template') self.assertEqual( len(template.parameters.keys()), 2) self.assertEqual( template.parameters['nested_key1'], 'nested_value1' ) self.assertEqual( template.parameters['nested_key2'], 'nested_value2' ) self.assertEqual( template.start, 239 ) self.assertEqual( template.end, 350 )
Greetings
Hannes
On 24 January 2012 12:49, Dr. Trigon dr.trigon@surfeu.ch wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
However it might be a starting point for further work. I also did not find formal specifications for wikitext so it was a lot of learning by doing. However I used it successfully on ~4k "Infobox Chemie" templates in the de-wiki.
As far as I can see there is no such specification. We all know how the wikipedia handles text markup and what format we have to use (e.g. to create a heading and so on...) IF we use correct syntax...
The problem is what happens IF your use NON-VALID wiki syntax on a page? The mediawiki software will then do "something" to get (at least) a valid HTML page, but what fall-backs are used? How is the proirity when parsing and so on... In my opinion this is the main issue here since "our" wikitext parser should behave similar on wrong wiki syntax also... (quite a messy thing I experienced... obviousely I am not a parser expert too... ;)
This is why I did not write a parser just my tiny (holy) 'getSections' method.
Greetings DrTrigon -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8emucACgkQAXWvBxzBrDDLQgCfdDlxFuZv9lqJM3mQOYwlXXWP /ksAoIk0hBOOtBV6grXIA0TdTB1KQg8A =yJSp -----END PGP SIGNATURE-----
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l