I've been experimenting with the parameters to Special:Export to retrieve the whole history of an article. I haven't been able to get more than 1000 revisions (from en wikipedia).
Does anyone know of a way to obtain the full history of an article? Those huge 7z exports seem too crazy to work with to extract data for only one page.
http://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
Rob
Robert Carter wrote:
I've been experimenting with the parameters to Special:Export to retrieve the whole history of an article. I haven't been able to get more than 1000 revisions (from en wikipedia).
Does anyone know of a way to obtain the full history of an article? Those huge 7z exports seem too crazy to work with to extract data for only one page.
You can use api.php with rvprop=content and rvcontinue to fetch the text of all revisions of a page. Please do this in a single thread with a substantial delay between requests, since this is a very expensive operation for our servers. Do not attempt to do it for a large number of pages, for that, use the XML download instead. Do not do it regularly or set up a web gateway which allows users to initiate these requests.
-- Tim Starling
Thanks for this. I added a note about the rvcontinue parameter to the API wiki docs - as it's only mentioned in the auto-generated MediaWiki API documentation page. http://www.mediawiki.org/wiki/API:Query_-_Properties#Parameters
To give me an idea what is reasonable: If I request 100 revisions every 10 seconds would that be ok?
Rob
On 30/11/2009, at 3:15 PM, Tim Starling wrote:
Robert Carter wrote:
I've been experimenting with the parameters to Special:Export to retrieve the whole history of an article. I haven't been able to get more than 1000 revisions (from en wikipedia).
Does anyone know of a way to obtain the full history of an article? Those huge 7z exports seem too crazy to work with to extract data for only one page.
You can use api.php with rvprop=content and rvcontinue to fetch the text of all revisions of a page. Please do this in a single thread with a substantial delay between requests, since this is a very expensive operation for our servers. Do not attempt to do it for a large number of pages, for that, use the XML download instead. Do not do it regularly or set up a web gateway which allows users to initiate these requests.
-- Tim Starling
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org