Sorry for the late responses; with classes, meetings, office hours, baby, and so on, I can't respond as fast as I'd like, but I'm really grateful for all the great responses.

Thanks, James for the ideas you've suggested; I summarize them thus:
* Publication date cut-off: We'll play with these and see how many we're left with.
* Randomize: ha ha ha
* Topic/empirical vs. conceptual/quantitative vs. qualitative: Actually, one of the features of our review is that we explicitly want to include non-computer science works in our review, many of which are conceptual, qualitative, and covering unusual topics (e.g. music). Any of these criteria would systematically exclude these articles. Unfortunately, we see that our journal vs. conference cut-off systematically excludes many computer science articles :-(
* Cited articles: We hadn't thought about this; I'll talk more about it in responding to Travis' thread.
* Adding more reviewers: I'll follow up on this in responding to Reid's thread.

Thanks a lot.

~ Chitu

-------- Message original --------
Sujet: Re: [Wiki-research-l] Wikipedia literature review - include or    exclude conference articles (was Request to verify articles    for Wikipedia literature review)
De : James Howison <james@howison.name>
Pour : Research into Wikimedia content and communities
    <wiki-research-l@lists.wikimedia.org>
Date : 15/03/2011 4:57 PM
I am a little sheepish; clearly you've really struggled with this, it's certainly a huge amount of papers.

I'm tempted to ask what happens if you cut by publication date, but I suspect that that doesn't help much because of the accelerating rate of publication. In any case not entirely sure of the justification for not including older things, it's not as though one stops knowing them :)

Ah, I know: randomize ;) Ok, that's not really in the spirit of a review article.

Have you considered cutting by some first quick pass characteristics, such as topic (using some framework relevant to your interests, we used Input-Process-Output for organizing studies of FLOSS)/empirical vs conceptual/perhaps even quant. vs qual.  That is, of course, a lot of work just there but it seems to deal with the selection bias the best. That would also help give a conceptual focus to the review article.

To avoid the full selection bias of excluding conferences, perhaps you could include only those that are cited in your journal articles? (hmmm, issues there, but perhaps worth thinking about; could one seek out some variant of "the connected set" of articles, with some cutting factor on the strength of linkage to bring the number down to something managable?).

Adding people to your review team is another option, I'm sure you've thought about that. Difficulties there are obvious (a good review goes beyond 'tagging' articles and conducts cross-cutting conceptually organized perspective, hard to coordinate or build through disconnected work).

Best wishes for the work,
James

On Mar 15, 2011, at 14:56, Chitu Okoli wrote:

James and Travis, you bring up a point that we have struggled back and forth with for several months. We really, really would like to include conference articles, but we just can't see how we could handle many more articles than what we've got now. We've been working on and off on this project for over two years now. (You can find works in progress at the link at the bottom to my website.) We'd like to get it done eventually, and we can only handle so many articles.