[Pywikipedia-bugs] [ pywikipediabot-Bugs-3414669 ] interwiki.py removing page text

SourceForge.net noreply at sourceforge.net
Mon Oct 3 19:23:12 UTC 2011


Bugs item #3414669, was opened at 2011-09-27 21:50
Message generated for change (Comment added) made by xqt
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3414669&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: interwiki
Group: None
Status: Open
Resolution: None
>Priority: 5
Private: No
Submitted By: hiw (hiw)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki.py removing page text

Initial Comment:
Following edit on NL-disamb. page, the page got emptied, only the interwiki link remained. Interwiki.py should not have touched the page in the first place, since the interwiki link was already set earlier.

Diff-link: http://nl.wikipedia.org/w/index.php?title=Blankenbach&diff=next&oldid=10676248

Active Python on Microsoft Windows XP [Version 5.1.2600]

Pywikipedia [http] trunk/pywikipedia (r9558, 2011/09/25, 20:30:54)
Python 2.7.2 (default, Jun 24 2011, 12:21:10) [MSC v.1500 32 bit (Intel)]
config-settings:
use_api = True
use_api_login = True
unicode test: ok


----------------------------------------------------------------------

>Comment By: xqt (xqt)
Date: 2011-10-03 21:23

Message:
r9580 prohibits editing empty pages. This should prevent problems but it
does not solve the bug itself.

----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-29 22:09

Message:
relevant wikitech-l thread:
http://lists.wikimedia.org/pipermail/wikitech-l/2011-September/055420.html

----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-29 21:34

Message:
Confirmed on eowiki, 25 suspected pages
http://eo.wikipedia.org/w/index.php?title=Anton%C3%ADn_Kl%C3%A1%C5%A1tersk%C3%BD&action=historysubmit&diff=3855198&oldid=1369139

Confirmed on simplewiki, 3 suspected pages


itwiki: no results
ptwiki: no results
dewiki: no results
frwiki: results, but all from the same antivandalism bot


----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-29 21:14

Message:
Using the following query to find suspected edits...
select rc_cur_time, rc_user, rc_namespace, rc_title, rc_old_len,
rc_new_len from recentchanges left join user_groups on ug_user=rc_user
where rc_new_len < rc_old_len * 0.1 and ug_group = 'bot' and
rc_namespace=0;
(note: this will not find *all* bad edits, but at least some)...

http://nl.wikipedia.org/w/index.php?title=Alexander_Gottfried&diff=prev&oldid=27327313
http://nl.wikipedia.org/w/index.php?title=Angerapp&diff=27329689&oldid=11579760
http://nl.wikipedia.org/w/index.php?title=Partjessnijder&diff=27331463&oldid=13954281
http://nl.wikipedia.org/w/index.php?title=Atax&diff=27330470&oldid=11968796
http://nl.wikipedia.org/w/index.php?title=Medinilla&diff=27328890&oldid=11015008
http://nl.wikipedia.org/w/index.php?title=Merklin&diff=27330198&oldid=11821859
http://nl.wikipedia.org/w/index.php?title=Pion&diff=27327730&oldid=14796262
http://nl.wikipedia.org/w/index.php?title=Vossenplein&diff=27327943&oldid=12882509
http://nl.wikipedia.org/w/index.php?title=Walser&diff=27329293&oldid=11842017

so.. at least the specificity is good, even if the sensitivity is not.
I'll try and see what happens on different wikis. Hopefully this will give
some hint whether it's 1.18 related or not.


----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-29 20:39

Message:
At the moment: no. In theory, the special:export function could probably be
replaced by one or more API calls, but I have no reason to assume this
actually solves the problem...

----------------------------------------------------------------------

Comment By: hiw (hiw)
Date: 2011-09-29 04:58

Message:
Can you force the script to use API to get the page text?

----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-28 21:47

Message:
I did some more testing, using
python interwiki.py -lang:de
-page:Blankenbach%20%28Begriffskl%C3%A4rung%29 -async -cleanup -auto
-async

note that these findings are not necessarily true for running on full
auto...

in this setup, the bot ALWAYS uses special:export to get page text. It
does use the API to write the pages. It only retrieves the pages ONCE, at
the start of the run.

sigh.

----------------------------------------------------------------------

Comment By: hiw (hiw)
Date: 2011-09-28 00:31

Message:
Pffff, I believe it was:

interwiki.py -all -async -cleanup -log -auto -start: 

I would think I used -ns:0 also, nut sure.

----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-27 22:53

Message:
Question to both committer and myst: what was the exact command line you
were using?

----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-27 22:49

Message:
Last note for tonight: quickly reviewing the diff to r9500 (2011-09-03) did
not yield anything really change. Note: I did this in one bunch. Reviewing
commits from the mailinglist one at a time might still be a good plan...

----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-27 22:42

Message:
Last three edits of interwiki.py are all quite old:


------------------------------------------------------------------------
r9407 | xqt | 2011-07-16 23:35:06 +0200 (Sat, 16 Jul 2011) | 1 line

trailing space for list elements (readability)
------------------------------------------------------------------------
r9387 | amir | 2011-07-16 12:05:50 +0200 (Sat, 16 Jul 2011) | 1 line

adding fa for exception templates
------------------------------------------------------------------------
r9308 | xqt | 2011-06-24 19:14:40 +0200 (Fri, 24 Jun 2011) | 1 line

do not follow static redirects which means do not change the target links
like -noredirect does (with -cleanup option. -force removes that link -
maybe this should be fixed)
------------------------------------------------------------------------


----------------------------------------------------------------------

Comment By: Merlijn S. van Deen (valhallasw)
Date: 2011-09-27 22:37

Message:
This has also happened with Myst's bot on simplewiki:
http://simple.wikipedia.org/w/index.php?title=Mettau%2C_Switzerland&action=historysubmit&diff=3060418&oldid=1249270

Increasing priority, rephrased title.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3414669&group_id=93107



More information about the Pywikipedia-bugs mailing list