Feel free to take any part of code out of my bot (I would be glad!) at
any time:
from this module you need:
Page.getSections()
Page._getSectionByteOffset()
Page._findSection()
to copy into wikipedia.py (just ignore addAttributes()) and the small
modification done to 'get' in:
Page.get()
has also to be added.
Since mediawiki software has from time to time (and page to page) some
issues in getting all that data correctly, the function may return with
an empty list [] (but only if mw has problems). This should happen very
rarely.
Greetings (and enjoy)
Am 11.10.2010 14:57, schrieb info(a)gno.de:
give me that stuff ;)
xqt
----- Original Nachricht ----
Von: "Dr. Trigon"<dr.trigon(a)surfeu.ch>
An: Pywikipedia discussion list<pywikipedia-l(a)lists.wikimedia.org>
Datum: 10.10.2010 22:25
Betreff: Re: [Pywikipedia-l] Archivebot and header1
> To be honest I do not really know the archive bot, as far as I know it
> uses various regex to determine the headings.
>
> Since my bot does the same, but I had serveral serious issues by using
> regex I wrote a own more sophisticated method to retrieve pages
> headings. I don't know if this is of any help for you but if you are
> interessted in this, please have a look at:
>
>
https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/dtbext/dtbext_wik
> ipedia.py?r=HEAD
>
> or
>
>
https://fisheye.toolserver.org/browse/~raw,r=44/drtrigon/pywikipedia/dtbext/
> dtbext_wikipedia.py
>
> and look for the 'getSections' method.
>
> By the way this is something that should be comitted to the framework
> anyway... ;))
>
> Greetings
>
>
> Am 26.09.2010 19:32, schrieb Bináris:
>> A user in huwiki regularly runs this script to archive a lot of talk
>> pages and community pages:
>>
http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:Cherybot/archivebot_hu.py
>> This is some modified version of archivebot.py.
>> We have a community page:
>>
>
http://hu.wikipedia.org/wiki/Wikip%C3%A9dia:B%C3%BCrokrat%C3%A1k_%C3%BCzen%C
> 5%91fala
>> This has 5 first level headers (=title=). This is unusual.
>> When the bot arhives a section above the =title=, the =title= line goes
>> to the archive, too.
>> Now, I was asked to help to correct this behavior. I am not familiar
>> with the whole thing, I have never run archivebot.py.
>>
>> The question is: was there any problem like this in another wiki, is
>> there a bugfix for this in the fresh version, or is it only our problem?
>>
>>
>> --
>> Bináris
>>
>>
>>
>> _______________________________________________
>> Pywikipedia-l mailing list
>> Pywikipedia-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>