Hi folks,
Any special:export experts out there?
I'm trying to download the complete revision history for just a few pages. The options, as I see it, are using the API or special:export. The API returns XML that is formatted differently than special:export and I already have a set of parsers that work with special:export data so I'm inclined to go with that.
I am running into the problem that, it seems when I try to use POST so that I can iteratively grab revisions in increments of 1000, I am denied (I get a WMF servers down error). If I use GET, it works, but then I can't use the parameters that allow me to iterate through all the revisions.
Code pasted below. Any suggestions as to why the server won't accept POST?
Better yet, does anyone already have a working script/tool handy that grabs all the revisions of a page? :)
Thanks, all! (Excuse the cross posting, I usually hang out on research, but thought perhaps folks on the developers list would have insight.) Andrea
class Wikipedia { public function __construct(){ } public function searchResults( $pageTitle = null, $initialRevision = null ) { $url = "http://en.wikipedia.org/w/index.php?title=Special:Export&pages=" . $pageTitle . "&offset=1&limit=1000&action=submit"; $curl = curl_init(); curl_setopt( $curl, CURLOPT_URL, $url ); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt( $curl, CURLOPT_POST, true); curl_setopt( $curl, CURLOPT_USERAGENT, "Page Revisions Retrieval Script - Andrea Forte - aforte@drexel.edu"); $result = curl_exec( $curl ); curl_close( $curl ); return $result; } }
On Thu, Jul 14, 2011 at 6:58 PM, Andrea Forte andrea.forte@gmail.com wrote:
I'm trying to download the complete revision history for just a few pages. The options, as I see it, are using the API or special:export. The API returns XML that is formatted differently than special:export and I already have a set of parsers that work with special:export data so I'm inclined to go with that.
You can use api.php?action=query&export&exportnowrap&titles=Foo|Bar|Baz , that should give you the same format.
Roan Kattouw (Catrope)
Andrea Forte, 14/07/2011 18:58:
Better yet, does anyone already have a working script/tool handy that grabs all the revisions of a page? :)
There's https://code.google.com/p/wikiteam/ Its purpose is to download whole wikis, but you can always edit the titles list.
Nemo
wikitech-l@lists.wikimedia.org