Cormac Lawler wrote:
Just something that occurs to me as I write up my dissertation - I keep on thinking it would be nice to be able to cite some basic figures to back up a point I am making, eg. how many times Wikipedia is edited on a given day or how many pages link to this policy page - as I asked in an email to the wikipedia-l list, which has mysteriously vanished from the archives (August 11, entitled "What links here?"). I realise these could be done by going to the recent changes or special pages and counting them all, but I'm basically too lazy to do that.
I'm doing different statistics of Wikipedia data for month. Not every data is available but there is *a lot* It's much more to analyse than I can do in my time. You can answer a lot of questions with the database dumps (recently changed to XML) and python mediawiki framework but that means you have to dig into the data models and programming.
we're talking about thousands of pages here, right? I'm also thinking this is something that many people would be interested in finding out and writing about. So what I'm asking is that to help researchers generally, wouldn't it be an idea to identify some quick database hacks that we could provide - almost like a kate's tools function? Or are these available on the MediaWiki pages?
The only solution is to share your code and data and to frequently publicate results. That's how research works isn't it?. I'm very interested to have a special server for Wikimetrics but someone has to admin it (getting the hardware is not such a problem). For instance I could parse the version history dump to select article, user and timestamp only so other people can analyse which articles are edited at which days or vice versa but I just don't have a server to handle Gigabytes of data. Up to know I only managed to set up a Data Warehouse for Personendaten (http://wdw.sieheauch.de/) but - like most of what's already done - mostly undocumented :-(
If they are, and I've looked at some database related pages, they're certainly not so understandable from the perspective of someone who just wants to use basic functions. You might be thinking of sending me to a page like http://meta.wikimedia.org/wiki/Links_table - but *what does it mean?* Can someone either help me out, or suggest what we could do about this in the future?
1.) collect the questions, define what exacly you want (for instance "number of articles edited at each day") 2.) collect ways to answer them ("extract data X from Y and calculate Z") 3.) find someone who does it
Well, it sounds like work ;-)
Greetings, Jakob