On 10/02/2017 12:30 PM, Roy Smith wrote:
I’m not seeing how to access the wikitext for a specific revision via the API. I can get the HTML with /page/html/{title}/{revision}, but I don’t see how to get the wikitext. Do I really need to get the HTML and then feed that through /transform/html/to/wikitext? That seems suboptimal. Not to mention rate limited :-(
What I want to do is get the wikitext for every revision of a page.
If you just want to download some revisions of a single page (for development purposes), https://en.wikipedia.org/w/api.php?action=query&prop=revisions&title... should be enough.
You'll have to use rvcontinue to get more than 50, and you should probably use a library like pywikibot.
Later, if you want to do it for more articles, go to https://dumps.wikimedia.org/backup-index-bydb.html and choose a wiki (e.g. enwiki).
You may need to click "Last dumped on" a couple times until you find a "All pages with complete edit history" with the links.
You can then download either a single archive (with all revisions of a subset of pages), or all of them.
Matt Flaschen