You can download selected pages from the Special:Export of the wiki in question simply by entering all the page names to dump (each on its own line), including their talk pages and unchecking "Include only the current revision, not the full history". The problem is that theree will be a point where the dump will have a fatal error (and thus be basically useless) so your internet connection stability and the length of the page(s) being dumped will determine how much you can grab in one go before you hit an error. There is a way to extract edits in chunks which I think is explained on the page you linked to, but that's similarly troublesome. An additional problem is that from time to time the "current revision only" checkbox will be disabled for performance reasons meaning the only way to get the pages is from the monthly dumps themselves.
It's not possible to dump only one user's contributions; typically, most of their edits will be built upon the contributions of others, and removing those others from the edit history would be a breach of the GNU Free Documentation License stipulations, so this is the likely reason for the lack of such a feature (although a script could likely work around this).
As for the scripts that's not something I can help you with. Anyway, I hope I've answered at least some of your questions. :) Garrett
On 16/03/07, Piotr Konieczny piokon@post.pl wrote:
Dear all,
I have a few questions about database dumps (I checked http://meta.wikimedia.org/wiki/Data_dumps and it has no answers to them). Perhaps you know the answer :)
First:
- is it possible to download a dump of only one page with history?
- is it possible to downliad a dump of only one (or selected) users
contributions?
- if not, is it possible to run some scripts/statistical analysis
without downloading the dump (100+ giga after decompressing looking at the estimates, 99,9% of which I don't need for my study...)
Second:
- I am rather bad at writing scripts (basically, programming). And I
would like to do something similar to what Anthony et al. have done ('Explaining Quality in Internet Collective Goods: Zealots and Good Samaritans in the Case of Wikipedia'), just limited to one article and its contributors. What they have done - excerpt:
"For each contributor, we use the Wikipedia differencing algorithm3 to compare the differences between three documents: (1) edit, the edit submitted by the contributor, (2) previous, the version of the article prior to the edit, and (3) current, the current version of the article as it exists on the day the sample was drawn (...) We measure the quality of an edit by calculating the number of characters from a contributor's edit that are retained in the current version, measured as the percentage retained of the total number of characters in the entry (retained in current/total in current)."
What I would like to do: run a script on a single article history and contributions of its users to get 'retention values' for those users edits on that article only AND on all of that user contribs in general.
If anybody knows of a script I could adapt for this purpose (or a place to ask), I would be most greatful for information - writing one is unfortunatly beyond my capabilities.
Thank you for your time,
-- Piotr Konieczny
"The problem about Wikipedia is, that it just works in reality, not in theory."
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wiki-research-l