To be honest I do not really know the archive bot, as far as I know it uses various regex to determine the headings.
Since my bot does the same, but I had serveral serious issues by using regex I wrote a own more sophisticated method to retrieve pages headings. I don't know if this is of any help for you but if you are interessted in this, please have a look at:
https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_wik...
or
https://fisheye.toolserver.org/browse/~raw,r=44/drtrigon/pywikipedia/dtbext/...
and look for the 'getSections' method.
By the way this is something that should be comitted to the framework anyway... ;))
Greetings
Am 26.09.2010 19:32, schrieb BinĂ¡ris:
A user in huwiki regularly runs this script to archive a lot of talk pages and community pages: http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:Cherybot/archivebot_hu.py This is some modified version of archivebot.py. We have a community page: http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:B%C3%BCrokrat%C3%A1k_%C3%BCzen%C... This has 5 first level headers (=title=). This is unusual. When the bot arhives a section above the =title=, the =title= line goes to the archive, too. Now, I was asked to help to correct this behavior. I am not familiar with the whole thing, I have never run archivebot.py.
The question is: was there any problem like this in another wiki, is there a bugfix for this in the fresh version, or is it only our problem?
-- BinĂ¡ris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l