Hi folks,
Any special:export experts out there?
I'm trying to download the complete revision history for just a few pages. The options, as I see it, are using the API or special:export. The API returns XML that is formatted differently than special:export and I already have a set of parsers that work with special:export data so I'm inclined to go with that.
I am running into the problem that, it seems when I try to use POST so that I can iteratively grab revisions in increments of 1000, I am denied (I get a WMF servers down error). If I use GET, it works, but then I can't use the parameters that allow me to iterate through all the revisions.
Code pasted below. Any suggestions as to why the server won't accept POST?
Better yet, does anyone already have a working script/tool handy that grabs all the revisions of a page? :)
Thanks, all! (Excuse the cross posting, I usually hang out on research, but thought perhaps folks on the developers list would have insight.) Andrea
class Wikipedia { public function __construct(){ } public function searchResults( $pageTitle = null, $initialRevision = null ) { $url = "http://en.wikipedia.org/w/index.php?title=Special:Export&pages=" . $pageTitle . "&offset=1&limit=1000&action=submit"; $curl = curl_init(); curl_setopt( $curl, CURLOPT_URL, $url ); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt( $curl, CURLOPT_POST, true); curl_setopt( $curl, CURLOPT_USERAGENT, "Page Revisions Retrieval Script - Andrea Forte - aforte@drexel.edu"); $result = curl_exec( $curl ); curl_close( $curl ); return $result; } }