Dear all
So I uploaded the code here:
https://sourceforge.net/tracker/?func=detail&aid=3479070&group_id=9…
The following test might best describe what the code is doing, working on a
nested template it is possible to retrieve the inner as well as the outer
template as a dictionary of key-value pair:
def test_nested_template(self):
nested_template = u"""
Cras suscipit lorem eget elit pulvinar et molestie magna tempus.
Vestibulum.
{{Toplevel template
| key1 = value1
| key2 = [[File:Al3+.svg|40px|Aluminiumion]]
{{nested template
| nested_key1 = nested_value1
| nested_key2 = nested_value2
}}
| key3 = value3
}} and more text
"""
# First fetch the outer template and assert that we get key1
through 3
template = templateparser.parse_template(nested_template, 'Toplevel
template')
expected = u'[[File:Al3+.svg|40px|Aluminiumion]] {{nested template
\n | nested_key1 = nested_value1\n | nested_key2 =
nested_value2\n }} '
self.assertEqual( len(template.parameters.keys()), 3)
self.assertEqual( template.parameters['key1'], 'value1' )
self.assertEqual( template.parameters['key3'], 'value3' )
self.assertEqual( template.parameters['key2'], expected)
self.assertEqual( template.start, 111 )
self.assertEqual( template.end, 401 )
self.assertFalse(template.parameters.has_key('nested_key1'))
#
# Now fetch the inner (nested) template and assert that we get
nested_key 1 and 2
template = templateparser.parse_template(nested_template, 'nested
template')
self.assertEqual( len(template.parameters.keys()), 2)
self.assertEqual( template.parameters['nested_key1'],
'nested_value1' )
self.assertEqual( template.parameters['nested_key2'],
'nested_value2' )
self.assertEqual( template.start, 239 )
self.assertEqual( template.end, 350 )
Greetings
Hannes
On 24 January 2012 12:49, Dr. Trigon <dr.trigon(a)surfeu.ch> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
However it might be a starting point for further
work. I also did
not find formal specifications for wikitext so it was a lot of
learning by doing. However I used it successfully on ~4k "Infobox
Chemie" templates in the de-wiki.
As far as I can see there is no such specification. We all know how the
wikipedia handles text markup and what format we have to use (e.g. to
create a heading and so on...) IF we use correct syntax...
The problem is what happens IF your use NON-VALID wiki syntax on a page?
The mediawiki software will then do "something" to get (at least) a
valid HTML page, but what fall-backs are used? How is the proirity when
parsing and so on... In my opinion this is the main issue here since
"our" wikitext parser should behave similar on wrong wiki syntax also...
(quite a messy thing I experienced... obviousely I am not a parser
expert too... ;)
This is why I did not write a parser just my tiny (holy) 'getSections'
method.
Greetings
DrTrigon
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org/
iEYEARECAAYFAk8emucACgkQAXWvBxzBrDDLQgCfdDlxFuZv9lqJM3mQOYwlXXWP
/ksAoIk0hBOOtBV6grXIA0TdTB1KQg8A
=yJSp
-----END PGP SIGNATURE-----
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l