Re: [Wikipedia-l] Research access to logs - Wikipedia-l

9 Sep 2005


      I don't know if this is still an open issue, but as someone who might ask 
for such data, here's my two cents.
 I attend a design program at a research institution, which gives me an 
interesting perspective of having to undergo the rigors of Human Subjects 
Committee approval for research that ultimately informs the design and may 
never result in a formal paper. What would be interesting to me is to be 
able to analyze both the statistical frequency of edits and the type of edit 
being performed. Since my interest is ultimately about what participants do 
when involved in online communities, it would be important to be able to go 
beyond the aggregate and be able to analyze the content with an eye on 
producing personas. I would like to be able to say that User Type B 
typically responds with Action Z when encountering User Type C, etc. That 
isn't possible unless there is some way to view individual threads of 
activity and qualitatively analyze the specific content.
 I have a MediaWiki-based experiment going on right now, which was approved 
by my school's review board in July. I'm not going to get anywhere close to 
Wikipedia size for members, so content analysis is going to be difficult but 
significantly easier than a massive wiki project. I have a SQL script that I 
run weekly to get some aggregate statistics and both user- and page-centric 
data sets. I created the entire site with HSC approval and full disclosure 
of how I plan to use the data (referring to users, if at all, by their 
chosen usernames only). And if someone asks me for the data, I would be 
required to follow HIPPA kinds of rules and allow all of the participants to 
prevent having their information included.
 What would be very useful would be to have some pre-programmed export of 
data that would provide this same information. The aggregate data shouldn't 
be a big obstacle, but I would think there would have to be some way of 
stripping out identification. Data-wise, that would be easy; assign a unique 
ID in place of MW userID or username. Content-wise, it's not so easy; can 
posted identity references and translated signature tags be masked? There 
may also need to be some means of allowing members to opt out whenever 
identity is involved.
 If the Powers the Be MediaWiki could resolve this by November, that would 
be wonderful.I'll get my HSC forms in order and get in line for that data. 
:)
 Kevin Makice