Dear all,
We've talked several times about resource and value, and there has emerged a clear divide between some who see the valuable resources at stake in this as programmer time and IT resource, and my view that the valuable resource we have is our editors' time and patience. Mostly I've expressed that in terms of throttling spam - we don't know how many surveys and how much overlap in surveys our editors will accept before they dis-enable Email, add research survey sites to the spam filter or start blocking researchers even if we've authorised them. In my view nobody wins if we wait until "the tragedy of the commons" has struck and all researchers have permanently lost access to a large proportion of our editors.
But there is another aspect where I think we may have been talking at cross purposes, and that's in our perception of the commercial value of research access to our community, and the motives of the researchers who have approached us. Wikimedia is a long established top ten website and one of the most famous examples of crowd sourcing and online communities. Most of the other successful websites wouldn't dream of allowing a competitor or potential competitor to conduct such research on their community - major websites are worth billions, so an insight from research on another community could be incredibly valuable. Our position is different, we are open to the re-use of our data for commercial purposes per CC-by-SA and a permissive approach to research as compatible with that. I haven't asked what commercial sponsors if any have funded the work of the various researchers who approach us, and I'd be happy for that to continue, provided we keep three safeguards:
I Open licensing. Anyone who wants to broadcast research surveys to our editing community needs to agree that the anonymised results of those surveys will be available under cc-by-sa, and not just a statistical digest but the actual dataset so that variables can be cross tabbed. But I can live with the researcher(s) also having a copy of the data under a different copyright if they are narrowcasting to a small group of editors rather than broadcasting to a large group.
II Timeliness. The cc-by-sa anonymised dataset needs to be published pretty much as soon as it could be, and not kept back until after the researcher has published their analysis of it.
III Transparency. The nightmare scenario to me would be if a top thousand website or aspirant: