Hi folks,
Any special:export experts out there?
I'm trying to download the complete revision history for just a few
pages. The options, as I see it, are using the API or special:export.
The API returns XML that is formatted differently than special:export
and I already have a set of parsers that work with special:export data
so I'm inclined to go with that.
I am running into the problem that, it seems when I try to use POST so
that I can iteratively grab revisions in increments of 1000, I am
denied (I get a WMF servers down error). If I use GET, it works, but
then I can't use the parameters that allow me to iterate through all
the revisions.
Code pasted below. Any suggestions as to why the server won't accept POST?
Better yet, does anyone already have a working script/tool handy that
grabs all the revisions of a page? :)
Thanks, all! (Excuse the cross posting, I usually hang out on
research, but thought perhaps folks on the developers list would have
insight.)
Andrea
class Wikipedia {
public function __construct(){ }
public function searchResults( $pageTitle = null, $initialRevision = null ) {
$url = "http://en.wikipedia.org/w/index.php?title=Special:Export&pages="
. $pageTitle . "&offset=1&limit=1000&action=submit";
$curl = curl_init();
curl_setopt( $curl, CURLOPT_URL, $url );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_POST, true);
curl_setopt( $curl, CURLOPT_USERAGENT, "Page Revisions Retrieval
Script - Andrea Forte - aforte(a)drexel.edu")uot;);
$result = curl_exec( $curl );
curl_close( $curl );
return $result;
}
}