To be honest I do not really know the archive bot, as far as I know it
uses various regex to determine the headings.
Since my bot does the same, but I had serveral serious issues by using
regex I wrote a own more sophisticated method to retrieve pages
headings. I don't know if this is of any help for you but if you are
interessted in this, please have a look at:
and look for the 'getSections' method.
By the way this is something that should be comitted to the framework
Am 26.09.2010 19:32, schrieb Bináris:
A user in huwiki regularly runs this script to archive
a lot of talk
pages and community pages:
This is some modified version of archivebot.py.
We have a community page:
This has 5 first level headers (=title=). This is unusual.
When the bot arhives a section above the =title=, the =title= line goes
to the archive, too.
Now, I was asked to help to correct this behavior. I am not familiar
with the whole thing, I have never run archivebot.py.
The question is: was there any problem like this in another wiki, is
there a bugfix for this in the fresh version, or is it only our problem?
Pywikipedia-l mailing list