Re: [Pywikipedia-l] Archivebot and header1

10 Oct 2010

To be honest I do not really know the archive bot, as far as I know it
uses various regex to determine the headings.

Since my bot does the same, but I had serveral serious issues by using
regex I wrote a own more sophisticated method to retrieve pages
headings. I don't know if this is of any help for you but if you are
interessted in this, please have a look at:

https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_wi…

or

https://fisheye.toolserver.org/browse/~raw,r=44/drtrigon/pywikipedia/dtbext…

and look for the 'getSections' method.

By the way this is something that should be comitted to the framework
anyway... ;))

Greetings

Am 26.09.2010 19:32, schrieb Bináris:
...
  A user in huwiki regularly runs this script to archive
a lot of talk
 pages and community pages:
 http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:Cherybot/archivebot_hu.py
 This is some modified version of archivebot.py.
 We have a community page:
 http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:B%C3%BCrokrat%C3%A1k_%C3%BCzen%…
 This has 5 first level headers (=title=). This is unusual.
 When the bot arhives a section above the =title=, the =title= line goes
 to the archive, too.
 Now, I was asked to help to correct this behavior. I am not familiar
 with the whole thing, I have never run archivebot.py.

 The question is: was there any problem like this in another wiki, is
 there a bugfix for this in the fresh version, or is it only our problem?

 --
 Bináris

 _______________________________________________
 Pywikipedia-l mailing list
 Pywikipedia-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] Archivebot and header1