Hello,
While writing a bot, I had to discard redirected pages from the XML dump. In order to be able to do it early, I modified xmlreader.py to parse the <redirect /> tag and add it to XmlEntry. I'm attaching the patch, which is not extensively tested.
I haven't updated the regex_parse method since it looks outdated anyway (it tries to create an XmlEntry with different arguments than usual).
Best regards,
Hello
2009/10/3 Santiago M. Mola cooldwind@gmail.com:
Hello,
While writing a bot, I had to discard redirected pages from the XML dump. In order to be able to do it early, I modified xmlreader.py to parse the <redirect /> tag and add it to XmlEntry. I'm attaching the patch, which is not extensively tested.
I haven't updated the regex_parse method since it looks outdated anyway (it tries to create an XmlEntry with different arguments than usual).
Thanks for your contribution!
I tweaked the patch a bit: - I renamed the "redirect" attribute to "isredirect" - Logic cleanup - Adding a relevant test in tests/test_xmlreader.py
The resulting patch is in trunk as r7366
-- Nicolas Dumazet — NicDumZ
pywikipedia-l@lists.wikimedia.org