Dear Sean,
2017-08-29 17:39 GMT+02:00 SEAN CHRISTOPHER BUCHANAN
<Sean.Buchanan2(a)umanitoba.ca>ca>:
We are looking to collect data on the difference
between revisions over the
lifetime of a three Wikipedia pages (see attached screenshot)
We haven’t found a way to do that through the channels on the web page and
were wondering if you have any ideas on how such data could be collected?
If you are interested in past revisions, the simplest way I can think
of is through Special:Export:
1. go to
https://en.wikipedia.org/wiki/Special:Export;
2. write Capitalism, Socialism and Communism in the textarea, each on
its own line (or repeat the whole process thrice with only one line
each time);
3. uncheck the “Include only the current revision, not the full
history” to get the full history;
4. click “Export” and download the file.
You will get a large XML file containing every single revision of each article.
In addition, if you are interested in getting new revisions as they
are made (ie. in real time), you might want to have a look at
EventStreams¹, but is is somehow less user friendly (unless the user
is well versed in programming, that is).
Best regards,
PS : what could be incredibly useful to dive into articles histories
would be to import them in git², as it would allow the user to see
diffs between revisions the way you see them online, to look for when
a given sentence has been added / removed, etc. There are some very
user-friendly tools to present the histories to non-technical users
once the import has been made.
¹
https://wikitech.wikimedia.org/wiki/EventStreams
²
https://git-scm.com/
--
Jérémie