On Fri, Nov 9, 2012 at 1:05 AM, Finn Aarup Nielsen <fn@imm.dtu.dk> wrote:


Den 09-11-2012 04:38, Rami Al-Rfou' skrev:


I am interested into counting the number of revisions every page went
through. I was wondering if it is possible to count that without using
the whole history dump. I mean is it available in the schema directly?
Is it computable without having the revisions text downloaded?

If you have toolserver access you can readily do it. Embarrassingly I cannot find a tool on the toolserver that already does that.

There is the Wikichecker that shows a count:

http://en.wikichecker.com/article/?a=Denmark
Just be aware that the site is still in beta, and that e.g. http://en.wikichecker.com/article/?a=Barack+Obama claims that the English Wikipedia's article on Barack Obama was started in July 2012 and has received 485 non-bot edits (the real number is likely over 20,000).


Moreover, many of my future projects will benefit a lot if Wikipedia has
incremental dumps of their database. Any one aware of something relevant
or close?

It is possible that this paper can help you:

"Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia's Edit History"

https://code.google.com/p/jwpl/


/Finn


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Tilman Bayer
Senior Operations Analyst (Movement Communications)
Wikimedia Foundation
IRC (Freenode): HaeB