I don't know if this is still an open issue, but as someone who might ask
for such data, here's my two cents.
I attend a design program at a research institution, which gives me an
interesting perspective of having to undergo the rigors of Human Subjects
Committee approval for research that ultimately informs the design and may
never result in a formal paper. What would be interesting to me is to be
able to analyze both the statistical frequency of edits and the type of edit
being performed. Since my interest is ultimately about what participants do
when involved in online communities, it would be important to be able to go
beyond the aggregate and be able to analyze the content with an eye on
producing personas. I would like to be able to say that User Type B
typically responds with Action Z when encountering User Type C, etc. That
isn't possible unless there is some way to view individual threads of
activity and qualitatively analyze the specific content.
I have a MediaWiki-based experiment going on right now, which was approved
by my school's review board in July. I'm not going to get anywhere close to
Wikipedia size for members, so content analysis is going to be difficult but
significantly easier than a massive wiki project. I have a SQL script that I
run weekly to get some aggregate statistics and both user- and page-centric
data sets. I created the entire site with HSC approval and full disclosure
of how I plan to use the data (referring to users, if at all, by their
chosen usernames only). And if someone asks me for the data, I would be
required to follow HIPPA kinds of rules and allow all of the participants to
prevent having their information included.
What would be very useful would be to have some pre-programmed export of
data that would provide this same information. The aggregate data shouldn't
be a big obstacle, but I would think there would have to be some way of
stripping out identification. Data-wise, that would be easy; assign a unique
ID in place of MW userID or username. Content-wise, it's not so easy; can
posted identity references and translated signature tags be masked? There
may also need to be some means of allowing members to opt out whenever
identity is involved.
If the Powers the Be MediaWiki could resolve this by November, that would
be wonderful.I'll get my HSC forms in order and get in line for that data.
:)
Kevin Makice