Dear all
So I uploaded the code here: https://sourceforge.net/tracker/?func=detail&aid=3479070&group_id=93107&atid=603140
The following test might best describe what the code is doing, working on a nested template it is possible to retrieve the inner as well as the outer template as a dictionary of key-value pair:
def test_nested_template(self):
nested_template = u"""
Cras suscipit lorem eget elit pulvinar et molestie magna tempus.
Vestibulum.
{{Toplevel template
| key1 = value1
| key2 = [[File:Al3+.svg|40px|Aluminiumion]] {{nested template
| nested_key1 = nested_value1
| nested_key2 = nested_value2
}}
| key3 = value3
}} and more text
"""
# First fetch the outer template and assert that we get key1 through 3
template = templateparser.parse_template(nested_template, 'Toplevel template')
expected = u'[[File:Al3+.svg|40px|Aluminiumion]] {{nested template \n | nested_key1 = nested_value1\n | nested_key2 = nested_value2\n }} '
self.assertEqual( len(template.parameters.keys()), 3)
self.assertEqual( template.parameters['key1'], 'value1' )
self.assertEqual( template.parameters['key3'], 'value3' )
self.assertEqual( template.parameters['key2'], expected)
self.assertEqual( template.start, 111 )
self.assertEqual( template.end, 401 )
self.assertFalse(template.parameters.has_key('nested_key1'))
#
# Now fetch the inner (nested) template and assert that we get nested_key 1 and 2
template = templateparser.parse_template(nested_template, 'nested template')
self.assertEqual( len(template.parameters.keys()), 2)
self.assertEqual( template.parameters['nested_key1'], 'nested_value1' )
self.assertEqual( template.parameters['nested_key2'], 'nested_value2' )
self.assertEqual( template.start, 239 )
self.assertEqual( template.end, 350 )
Greetings
Hannes
On 24 January 2012 12:49, Dr. Trigon <dr.trigon@surfeu.ch> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>> However it might be a starting point for further work. I also did
>> not find formal specifications for wikitext so it was a lot of
>> learning by doing. However I used it successfully on ~4k "Infobox
>> Chemie" templates in the de-wiki.
>
> As far as I can see there is no such specification. We all know how the
> wikipedia handles text markup and what format we have to use (e.g. to
> create a heading and so on...) IF we use correct syntax...
>
> The problem is what happens IF your use NON-VALID wiki syntax on a page?
> The mediawiki software will then do "something" to get (at least) a
> valid HTML page, but what fall-backs are used? How is the proirity when
> parsing and so on... In my opinion this is the main issue here since
> "our" wikitext parser should behave similar on wrong wiki syntax also...
> (quite a messy thing I experienced... obviousely I am not a parser
> expert too... ;)
>
> This is why I did not write a parser just my tiny (holy) 'getSections'
> method.
>
> Greetings
> DrTrigon
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk8emucACgkQAXWvBxzBrDDLQgCfdDlxFuZv9lqJM3mQOYwlXXWP
> /ksAoIk0hBOOtBV6grXIA0TdTB1KQg8A
> =yJSp
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l