Re: [Wikitech-l] serious interwiki.py issues on MW 1.18 wikis

30 Sep 2011


      Hi Ariel and Andre,
On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn ariel@wikimedia.orgwrote:
...
Out of curiosity... If the new revisions of one of these badly edited
pages are deleted, leaving the top revision as the one just before the
bad iw bot edit, does a rerun of the bot on the page fail?
On Fri, Sep 30, 2011 at 11:13 AM, Andre Engels andreengels@gmail.com wrote:
...
I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
the problematic bot edit. When now I look at the page, instead of the page
content I get:
(...)
Using this undeleted version, and running interwiki.py, gives the
expected result:
valhallasw@dorthonion:~/src/pywikipedia/trunk$ python interwiki.py
-page:Blankenbach
NOTE: Number of pages queued is 0, trying to add 60 more.
Getting 1 pages from wikipedia:nl...
WARNING: Family file wikipedia contains version number 1.17wmf1, but
it should be 1.18wmf1
NOTE: [[nl:Blankenbach]] does not exist. Skipping.
This also happens for running it from dewiki (python interwiki.py
-lang:de -page:Blankenbach%20%28Begriffskl%C3%A4rung%29) or running as
'full-auto' bot (python interwiki.py -all -async -cleanup -log -auto
-ns:0 -start:Blankenbach).
Special:Export acts like the page just does not exist
(http://nl.wikipedia.org/w/index.php?title=Speciaal:Exporteren&useskin=mo...
shows page Blanzac but not Blankenbach)
api.php also more or less does the expected thing:
http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
- that is, unless you supply rvlimit=1:
http://nl.wikipedia.org/w/api.php?action=query&prop=revisions&titles...
However, none of them seem to return an empty page - and playing
around with pywikipediabot does not allow be to get an empty page
(depending on settings, it can either be the result on the edit page
(page.get(), use_api=False / screen scraping), a
pywikibot.exceptions.NoPage exception (PreloadingGenerator /
wikipedia.getall, which uses Special:Export) or the correct page text
(page.get(), use_api=True).
Anyway, thanks a huge heap for trying this (and for everyone, for
thinking about it). Unfortunately, I won't have much time this weekend
to debug -- hopefully some other pwb developer has.
Best regards, and thanks again,
Merlijn
P.S.
On 30 September 2011 11:12, Max Semenik maxsem.wiki@gmail.com wrote:
...
So you screen-scrape? No surprise it breaks. Why? For example, due to
protocol-relative URLs. Or some other changes to HTML output. Why not just
use API?
No, most of pywikipedia has been adapted to the api and/or
special:export, which, imo, is just an 'old' mediawiki api. Keep in
mind interwiki.py is old (2003!), and pywikipedia initally was an
extension of the interwiki bot. Thus, there could very well be some
code that is seldom used which still uses screen scraping. And
actually, in practice, screen scraping worked pretty well.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] serious interwiki.py issues on MW 1.18 wikis