pywikibot November 2012

pywikibot@lists.wikimedia.org

12 participants
15 discussions

[Pywikipedia-l] Problem saving long pages, restricted to rewrite branch?
by Morten Wang 13 Nov '12

13 Nov '12

Hi all, I noticed that SuggestBot's struggled with saving a user page earlier this week, see http://en.wikipedia.org/w/index.php?title=User_talk:The_Master_of_Mayhem&ac… Notice the larger number of saves that have a diff size of 0 bytes. I suspect it's due to the size of the page, 300+kB, is this a typical problem? If not I can start digging to see if I can figure out what's going on. If it is a common problem, what are some typical ways of solving it? Just checking the page size and skip/abort if it's too large? Regards, Morten

3 7

Re: [Pywikipedia-l] [wikiteam-discuss:608] [WARNING] Don't use dumpgenerator.py with API
by Hydriz Wikipedia 09 Nov '12

09 Nov '12

Scott, Nemo is referring to the dumpgenerator.py being broken on MediaWiki versions above 1.20, and it should not actually affect older MediaWiki versions. You can safely continue with your grab. :) On Sat, Nov 10, 2012 at 12:45 PM, Scott Boyd <scottdb56(a)gmail.com> wrote: > At this link: https://code.google.com/p/wikiteam/issues/detail?id=56 , at > the bottom, there is an entry by project member nemowiki that states: > > Comment 7 <https://code.google.com/p/wikiteam/issues/detail?id=56#c7>by project member > nemowiki <https://code.google.com/u/101255742639286016490/>, Today (9 > hours ago) > > Fixed by emijrp in r806 <https://code.google.com/p/wikiteam/source/detail?r=806>. :-) > > *Status:* Fixed > > So does that mean this problem that "It's completely broken" is now fixed? > I'm running a huge download of 64K+ page titles, and am now using the > "r806" version of dumpgenerator.py. The first 35K+ page titles were > downloaded with an older version). Both versions sure seem to be > downloading MORE than 500 pages per namespace, but I'm not sure, since I > don't know how you can tell if you are getting them all... > > So is it fixed or not? > > > On Fri, Nov 9, 2012 at 4:27 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com>wrote: > >> It's completely broken: https://code.google.com/p/** >> wikiteam/issues/detail?id=56<https://code.google.com/p/wikiteam/issues/detail?id=56> >> It will download only a fraction of the wiki, 500 pages at most per >> namespace. >> >> >> > > -- Regards, Hydriz We've created the greatest collection of shared knowledge in history. Help protect Wikipedia. Donate now: http://donate.wikimedia.org

1 0

Re: [Pywikipedia-l] [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API
by Federico Leva (Nemo) 09 Nov '12

09 Nov '12

Brad Jorsch, 09/11/2012 17:30: > On Fri, Nov 9, 2012 at 7:59 AM, Hydriz Wikipedia <admin(a)alphacorp.tk> wrote: >> >> You mentioned "a while back" for "apcontinue", show recent was it? This dump >> generator is attempting to archive all sorts of versions of MediaWiki, or so >> unless we write a backward compatibility handler in the script itself. > > July 2012: http://lists.wikimedia.org/pipermail/mediawiki-api-announce/2012-July/00003… > > Any wiki running version 1.19, or a 1.20 snapshot from before > mid-July, would be returning the old parameter. If you do it right, > though, there's little you have to do. Just use whichever keys are > given you inside the <query-continue> node. Even with your regular > expression mess, just capture which key is given as well as the value > and use it as the key for your params dict. Thank you again for your useful suggestions! However, as already noted, https://www.mediawiki.org/wiki/API:Query#Continuing_queries doesn't give any info about supported releases. Nemo P.s.: Small unreliable "temporary" things in MediaWiki, like the "powered by MediaWiki" sentence we grep for, are usually the most permanent ones, although I don't like it.

4 7

Re: [Pywikipedia-l] [Mediawiki-api] [WARNING] Don't use dumpgenerator.py with API
by Federico Leva (Nemo) 09 Nov '12

09 Nov '12

Hydriz Wikipedia, 09/11/2012 16:59: > You mentioned "a while back" for "apcontinue", show recent was it? This > dump generator is attempting to archive all sorts of versions of > MediaWiki, or so unless we write a backward compatibility handler in the > script itself. +1 https://www.mediawiki.org/wiki/API:Allpages , https://www.mediawiki.org/wiki/API:Lists and https://www.mediawiki.org/wiki/API:Query#Continuing_queries don't really shed any light. > ...and I agree, the code is in a total mess. We need to get someone to > rewrite the whole thing, soon. Well, that in an ideal world. In this one, the best would probably be suggestions for simple libraries to be used to solve such small problems? (Which can become very big if one doesn't follow API evolution very closely or know it's history from the beginning of time.) Nemo > On Fri, Nov 9, 2012 at 11:50 PM, Brad Jorsch wrote: > > You're searching for the continue parameter as "apfrom", but this was > changed to "apcontinue" a while back. Changing line 162 to something > like this should probably do it: > > m = re.findall(r'<allpages (?:apfrom|apcontinue)="([^>]+)" />', > xml) > > Note that for full correctness, you probably should omit both apfrom > and apcontinue entirely from params the first time around, and send > back whichever of the two is found by the above line in subsequent > queries. > > Also, why in the world aren't you using an XML parser (or a JSON > parser with format=json) to process the API response instead of trying > to parse the XML using regular expressions?! > > On Fri, Nov 9, 2012 at 2:27 AM, Federico Leva (Nemo) > <nemowiki(a)gmail.com <mailto:nemowiki@gmail.com>> wrote: > > It's completely broken: > > https://code.google.com/p/wikiteam/issues/detail?id=56 > > It will download only a fraction of the wiki, 500 pages at most per > > namespace. > > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api(a)lists.wikimedia.org > <mailto:Mediawiki-api@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > > > > -- > Regards, > Hydriz > > We've created the greatest collection of shared knowledge in history. > Help protect Wikipedia. Donate now: http://donate.wikimedia.org > <http://donate.wikimedia.org/> > > > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >

1 0

[Pywikipedia-l] [WARNING] Don't use dumpgenerator.py with API
by Federico Leva (Nemo) 09 Nov '12

09 Nov '12

It's completely broken: https://code.google.com/p/wikiteam/issues/detail?id=56 It will download only a fraction of the wiki, 500 pages at most per namespace. Let me reiterate that https://code.google.com/p/wikiteam/issues/detail?id=44 is a very urgent bug and we've seen no work on it in many months. We need an actual programmer with some knowledge of python to fix it and make the script work properly; I know there are several on this list (and elsewhere), please please help. The last time I, as a non-coder, tried to fix a bug, I made things worse (https://code.google.com/p/wikiteam/issues/detail?id=26). Only after API is implemented/fixed, I'll be able to re-archive the 4-5 thousands wikis we've recently archived on archive.org (https://archive.org/details/wikiteam) and possibly many more. Many of those dumps contain errors and/or are just partial because of the script's unreliability, and wikis die on a daily basis. (So, quoting emijrp, there IS a deadline.) Nemo P.s.: Cc'ing some lists out of desperation; sorry for cross-posting.

4 5

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot November 2012