Re: [Wikitech-l] serious interwiki.py issues on MW 1.18 wikis

1 Oct 2011

      Merlijn,
Not bothered by any actual knowledge of pywikibot (which makes it far
easier to comment!), is it possible that the bot assumes it's fetching
a page, but actually raises an error instead, and this is not handled,
interpeting the lack of response as an empty string?
Regards,
Martijn
On Fri, Sep 30, 2011 at 10:37 PM, Merlijn van Deen valhallasw@arctus.nl wrote:
...
Hi Ariel and Andre,
On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn ariel@wikimedia.orgwrote:
...
Out of curiosity... If the new revisions of one of these badly edited
pages are deleted, leaving the top revision as the one just before the
bad iw bot edit, does a rerun of the bot on the page fail?
On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels andreengels@gmail.com wrote:
...
I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
the problematic bot edit. When now I look at the page, instead of the page
content I get:
(...)
Using this undeleted version, and running interwiki.py, gives the
expected result:
valhallasw@dorthonion:~/src/pywikipedia/trunk$ python interwiki.py
-page:Blankenbach
NOTE: Number of pages queued is 0, trying to add 60 more.
Getting 1 pages from wikipedia:nl...
WARNING: Family file wikipedia contains version number 1.17wmf1, but
it should be 1.18wmf1
NOTE: [[nl:Blankenbach]] does not exist. Skipping.
This also happens for running it from dewiki (python interwiki.py
-lang:de -page:Blankenbach%20%28Begriffskl%C3%A4rung%29) or running as
'full-auto' bot (python interwiki.py -all -async -cleanup -log -auto
-ns:0 -start:Blankenbach).
Special:Export acts like the page just does not exist
(http://nl.wikipedia.org/w/index.php?title=Speciaal:Exporteren&useskin=mo...
shows page Blanzac but not Blankenbach)
api.php also more or less does the expected thing:
http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles...

that is, unless you supply rvlimit=1:

http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
However, none of them seem to return an empty page - and playing
around with pywikipediabot does not allow be to get an empty page
(depending on settings, it can either be the result on the edit page
(page.get(), use_api=False / screen scraping), a
pywikibot.exceptions.NoPage exception (PreloadingGenerator /
wikipedia.getall, which uses Special:Export) or the correct page text
(page.get(), use_api=True).
Anyway, thanks a huge heap for trying this (and for everyone, for
thinking about it). Unfortunately, I won't have much time this weekend
to debug -- hopefully some other pwb developer has.
Best regards, and thanks again,
Merlijn
P.S.
On 30 September 2011 11:12, Max Semenik maxsem.wiki@gmail.com wrote:
...
So you screen-scrape? No surprise it breaks. Why? For example, due to
protocol-relative URLs. Or some other changes to HTML output. Why not just
use API?
No, most of pywikipedia has been adapted to the api and/or
special:export, which, imo, is just an 'old' mediawiki api. Keep in
mind interwiki.py is old (2003!), and pywikipedia initally was an
extension of the interwiki bot. Thus, there could very well be some
code that is seldom used which still uses screen scraping. And
actually, in practice, screen scraping worked pretty well.

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] serious interwiki.py issues on MW 1.18 wikis