Scott,
Nemo is referring to the dumpgenerator.py being broken on MediaWiki versions above 1.20, and it should not actually affect older MediaWiki versions.
You can safely continue with your grab. :)
On Sat, Nov 10, 2012 at 12:45 PM, Scott Boyd scottdb56@gmail.com wrote:
At this link: https://code.google.com/p/wikiteam/issues/detail?id=56 , at the bottom, there is an entry by project member nemowiki that states:
Comment 7 <https://code.google.com/p/wikiteam/issues/detail?id=56#c7>by project member
nemowiki https://code.google.com/u/101255742639286016490/, Today (9 hours ago)
Fixed by emijrp in r806 <https://code.google.com/p/wikiteam/source/detail?r=806>. :-) *Status:* Fixed
So does that mean this problem that "It's completely broken" is now fixed? I'm running a huge download of 64K+ page titles, and am now using the "r806" version of dumpgenerator.py. The first 35K+ page titles were downloaded with an older version). Both versions sure seem to be downloading MORE than 500 pages per namespace, but I'm not sure, since I don't know how you can tell if you are getting them all...
So is it fixed or not?
On Fri, Nov 9, 2012 at 4:27 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
It's completely broken: https://code.google.com/p/** wikiteam/issues/detail?id=56https://code.google.com/p/wikiteam/issues/detail?id=56 It will download only a fraction of the wiki, 500 pages at most per namespace.
pywikipedia-l@lists.wikimedia.org