Cormac Lawler wrote:
Just something that occurs to me as I write up my
dissertation - I
keep on thinking it would be nice to be able to cite some basic
figures to back up a point I am making, eg. how many times Wikipedia
is edited on a given day or how many pages link to this policy page -
as I asked in an email to the wikipedia-l list, which has mysteriously
vanished from the archives (August 11, entitled "What links here?"). I
realise these could be done by going to the recent changes or special
pages and counting them all, but I'm basically too lazy to do that.
I'm doing different statistics of Wikipedia data for month. Not every
data is available but there is *a lot* It's much more to analyse than I
can do in my time. You can answer a lot of questions with the database
dumps (recently changed to XML) and python mediawiki framework but that
means you have to dig into the data models and programming.
we're talking about thousands of pages here,
right? I'm also thinking
this is something that many people would be interested in finding out
and writing about. So what I'm asking is that to help researchers
generally, wouldn't it be an idea to identify some quick database
hacks that we could provide - almost like a kate's tools function?
Or are these available on the MediaWiki pages?
The only solution is to share your code and data and to frequently
publicate results. That's how research works isn't it?. I'm very
interested to have a special server for Wikimetrics but someone has to
admin it (getting the hardware is not such a problem). For instance I
could parse the version history dump to select article, user and
timestamp only so other people can analyse which articles are edited at
which days or vice versa but I just don't have a server to handle
Gigabytes of data. Up to know I only managed to set up a Data Warehouse
for Personendaten (
http://wdw.sieheauch.de/) but - like most of what's
already done - mostly undocumented :-(
If they are, and I've looked at some database
related pages, they're
certainly not so
understandable from the perspective of someone who just wants to use
basic functions. You might be thinking of sending me to a page like
http://meta.wikimedia.org/wiki/Links_table - but *what does it mean?*
Can someone either help me out, or suggest what we could do about this
in the future?
1.) collect the questions, define what exacly you want (for instance
"number of articles edited at each day")
2.) collect ways to answer them ("extract data X from Y and calculate Z")
3.) find someone who does it
Well, it sounds like work ;-)
Greetings,
Jakob