I don't know if this is still an open issue, but as someone who might ask for such data, here's my two cents. I attend a design program at a research institution, which gives me an interesting perspective of having to undergo the rigors of Human Subjects Committee approval for research that ultimately informs the design and may never result in a formal paper. What would be interesting to me is to be able to analyze both the statistical frequency of edits and the type of edit being performed. Since my interest is ultimately about what participants do when involved in online communities, it would be important to be able to go beyond the aggregate and be able to analyze the content with an eye on producing personas. I would like to be able to say that User Type B typically responds with Action Z when encountering User Type C, etc. That isn't possible unless there is some way to view individual threads of activity and qualitatively analyze the specific content. I have a MediaWiki-based experiment going on right now, which was approved by my school's review board in July. I'm not going to get anywhere close to Wikipedia size for members, so content analysis is going to be difficult but significantly easier than a massive wiki project. I have a SQL script that I run weekly to get some aggregate statistics and both user- and page-centric data sets. I created the entire site with HSC approval and full disclosure of how I plan to use the data (referring to users, if at all, by their chosen usernames only). And if someone asks me for the data, I would be required to follow HIPPA kinds of rules and allow all of the participants to prevent having their information included. What would be very useful would be to have some pre-programmed export of data that would provide this same information. The aggregate data shouldn't be a big obstacle, but I would think there would have to be some way of stripping out identification. Data-wise, that would be easy; assign a unique ID in place of MW userID or username. Content-wise, it's not so easy; can posted identity references and translated signature tags be masked? There may also need to be some means of allowing members to opt out whenever identity is involved. If the Powers the Be MediaWiki could resolve this by November, that would be wonderful.I'll get my HSC forms in order and get in line for that data. :) Kevin Makice
wikipedia-l@lists.wikimedia.org