Dear Sean,
2017-08-29 17:39 GMT+02:00 SEAN CHRISTOPHER BUCHANAN Sean.Buchanan2@umanitoba.ca:
We are looking to collect data on the difference between revisions over the lifetime of a three Wikipedia pages (see attached screenshot)
We haven’t found a way to do that through the channels on the web page and were wondering if you have any ideas on how such data could be collected?
If you are interested in past revisions, the simplest way I can think of is through Special:Export:
1. go to https://en.wikipedia.org/wiki/Special:Export; 2. write Capitalism, Socialism and Communism in the textarea, each on its own line (or repeat the whole process thrice with only one line each time); 3. uncheck the “Include only the current revision, not the full history” to get the full history; 4. click “Export” and download the file.
You will get a large XML file containing every single revision of each article.
In addition, if you are interested in getting new revisions as they are made (ie. in real time), you might want to have a look at EventStreams¹, but is is somehow less user friendly (unless the user is well versed in programming, that is).
Best regards,
PS : what could be incredibly useful to dive into articles histories would be to import them in git², as it would allow the user to see diffs between revisions the way you see them online, to look for when a given sentence has been added / removed, etc. There are some very user-friendly tools to present the histories to non-technical users once the import has been made.
¹ https://wikitech.wikimedia.org/wiki/EventStreams ² https://git-scm.com/