You can download selected pages from the Special:Export of the wiki in question simply by entering all the page names to dump (each on its own line), including their talk pages and unchecking ". The problem is that theree will be a point where the dump will have a fatal error (and thus be basically useless) so your internet connection stability and the length of the page(s) being dumped will determine how much you can grab in one go before you hit an error. There is a way to extract edits in chunks which I think is explained on the page you linked to, but that's similarly troublesome. An additional problem is that from time to time the "current revision only" checkbox will be disabled for performance reasons meaning the only way to get the pages is from the monthly dumps themselves.
It's not possible to dump only one user's contributions; typically, most of their edits will be built upon the contributions of others, and removing those others from the edit history would be a breach of the GNU Free Documentation License stipulations, so this is the likely reason for the lack of such a feature (although a script could likely work around this).
As for the scripts that's not something I can help you with. Anyway, I hope I've answered at least some of your questions. :)
Garrett
Dear all,
I have a few questions about database dumps (I checked
http://meta.wikimedia.org/wiki/Data_dumps and it has no answers to
them). Perhaps you know the answer :)
First:
* is it possible to download a dump of only one page with history?
* is it possible to downliad a dump of only one (or selected) users
contributions?
* if not, is it possible to run some scripts/statistical analysis
without downloading the dump (100+ giga after decompressing looking at
the estimates, 99,9% of which I don't need for my study...)
Second:
* I am rather bad at writing scripts (basically, programming). And I
would like to do something similar to what Anthony et al. have done
('Explaining Quality in Internet Collective Goods: Zealots and Good
Samaritans in the Case of Wikipedia'), just limited to one article and
its contributors. What they have done - excerpt:
"For each contributor, we use the Wikipedia differencing algorithm3 to
compare the differences between three documents: (1) edit, the edit
submitted by the contributor, (2) previous, the version of the article
prior to the edit, and (3) current, the current version of the article
as it exists on the day the sample was drawn (...) We measure the
quality of an edit by calculating the number of characters from a
contributor's edit that are retained in the current version, measured as
the percentage retained of the total number of characters in the entry
(retained in current/total in current)."
What I would like to do: run a script on a single article history and
contributions of its users to get 'retention values' for those users
edits on that article only AND on all of that user contribs in general.
If anybody knows of a script I could adapt for this purpose (or a place
to ask), I would be most greatful for information - writing one is
unfortunatly beyond my capabilities.
Thank you for your time,
--
Piotr Konieczny
"The problem about Wikipedia is, that it just works in reality, not in
theory."
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l