Hello,
Hello, My name is Sean Buchanan. I am a professor of Business Administration at the Asper School of Business at the University of Manitoba. My colleagues and I are trying to collect a certain type of data from Wikipedia and would like some advice on the most efficient and user friendly way of collecting this data.
We are looking to collect data on the difference between revisions over the lifetime of a three Wikipedia pages (see attached screenshot) We haven't found a way to do that through the channels on the web page and were wondering if you have any ideas on how such data could be collected?
We are interested in the revision history for the following pages:
1) Capitalism 2) Socialism 3) Communism
Thank you for your help! I look forward to hearing from you. Sincerely, Sean Buchanan
Looks good
31 Ağu 2017 ÖS 3:45 tarihinde "SEAN CHRISTOPHER BUCHANAN" < Sean.Buchanan2@umanitoba.ca> yazdı:
Hello,
Hello,
My name is Sean Buchanan. I am a professor of Business Administration at the Asper School of Business at the University of Manitoba. My colleagues and I are trying to collect a certain type of data from Wikipedia and would like some advice on the most efficient and user friendly way of collecting this data.
We are looking to collect data on the difference between revisions over the lifetime of a three Wikipedia pages (see attached screenshot)
We haven’t found a way to do that through the channels on the web page and were wondering if you have any ideas on how such data could be collected?
We are interested in the revision history for the following pages:
Capitalism
Socialism
Communism
Thank you for your help! I look forward to hearing from you. Sincerely, Sean Buchanan
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Dear Sean,
2017-08-29 17:39 GMT+02:00 SEAN CHRISTOPHER BUCHANAN Sean.Buchanan2@umanitoba.ca:
We are looking to collect data on the difference between revisions over the lifetime of a three Wikipedia pages (see attached screenshot)
We haven’t found a way to do that through the channels on the web page and were wondering if you have any ideas on how such data could be collected?
If you are interested in past revisions, the simplest way I can think of is through Special:Export:
1. go to https://en.wikipedia.org/wiki/Special:Export; 2. write Capitalism, Socialism and Communism in the textarea, each on its own line (or repeat the whole process thrice with only one line each time); 3. uncheck the “Include only the current revision, not the full history” to get the full history; 4. click “Export” and download the file.
You will get a large XML file containing every single revision of each article.
In addition, if you are interested in getting new revisions as they are made (ie. in real time), you might want to have a look at EventStreams¹, but is is somehow less user friendly (unless the user is well versed in programming, that is).
Best regards,
PS : what could be incredibly useful to dive into articles histories would be to import them in git², as it would allow the user to see diffs between revisions the way you see them online, to look for when a given sentence has been added / removed, etc. There are some very user-friendly tools to present the histories to non-technical users once the import has been made.
¹ https://wikitech.wikimedia.org/wiki/EventStreams ² https://git-scm.com/
xmldatadumps-l@lists.wikimedia.org