Dear all,
I have a few questions about database dumps (I checked http://meta.wikimedia.org/wiki/Data_dumps and it has no answers to them). Perhaps you know the answer :)
First: * is it possible to download a dump of only one page with history? * is it possible to downliad a dump of only one (or selected) users contributions? * if not, is it possible to run some scripts/statistical analysis without downloading the dump (100+ giga after decompressing looking at the estimates, 99,9% of which I don't need for my study...)
Second: * I am rather bad at writing scripts (basically, programming). And I would like to do something similar to what Anthony et al. have done ('Explaining Quality in Internet Collective Goods: Zealots and Good Samaritans in the Case of Wikipedia'), just limited to one article and its contributors. What they have done - excerpt:
"For each contributor, we use the Wikipedia differencing algorithm3 to compare the differences between three documents: (1) edit, the edit submitted by the contributor, (2) previous, the version of the article prior to the edit, and (3) current, the current version of the article as it exists on the day the sample was drawn (...) We measure the quality of an edit by calculating the number of characters from a contributor’s edit that are retained in the current version, measured as the percentage retained of the total number of characters in the entry (retained in current/total in current)."
What I would like to do: run a script on a single article history and contributions of its users to get 'retention values' for those users edits on that article only AND on all of that user contribs in general.
If anybody knows of a script I could adapt for this purpose (or a place to ask), I would be most greatful for information - writing one is unfortunatly beyond my capabilities.
Thank you for your time,