I like this idea.
I see this as a topic map problem, keeping track of provenance and topics covered. Wikipedia is run very much like a topic map; there are ways to disambiguate name collisions, and each article is restricted to one and only one topic, with occasional merge suggestions.
As a side note, consider the addition of available open source software for "reading" each and every document that is readable (not all PDFs, for instance, can be read by machine). Doing so, you can create a searchable index into each document, perform varieties of studies, e.g. wordgrams, topic models (Latent Dirichlet), and even clustering, as a background task that provides a larger context for all reviews.
Decision making on which documents rate highest is complex, possibly wicked, and perhaps should relate more to particular goals; which papers contribute how much information to which topics, etc. What animates that thought is the "evidence profiles" used by Watson in the Jeopardy competition. Watson was nowhere near competitive until they created and refined evidence profiles.
Jack
On Wed, Mar 16, 2011 at 9:26 AM, Reid Priedhorsky reid@reidster.net wrote:
Chitu and others,
I too see great need for a comprehensive survey paper in this field. My own personal interest is in one that covers wiki research in general, not just research of Wikipedia; this of course makes the intractable number of papers even more intractable.
In fact, I am involved with a team of researchers with the same goal as you, though we are just getting started.
It seems to me that you are in a very difficult position. As others have noted, the scoping filter you propose is not a good one, but the number of papers is simply intractable without a very aggressive filter that excludes 2/3 or more of the known papers. (To further complicate the issue, I am skeptical of machine filtering period, fearing that any useful filter would necessarily be complex and difficult to justify in a writeup.)
However, I believe that there is a solution, and that is to dramatically increase the team size by doing the analysis wiki style. Rather than a small team creating the review, do it in public with an open set of contributors. Specifically, I propose:
- Create a public Mediawiki instance.
- Decide on a relatively standardized format of reviewing each paper
(metadata formats, an infobox, how to write reviews of each, etc.) 3. Upload your existing Zotero database into this new wiki (I would be happy to write a script to do this). 4. Proceed with paper readings, with the goal that every single paper is looked at by human eyes. 5. Use this content to produce one or more review articles.
The goals of the effort would be threefold.
- Create an annotated bibliography of wiki research that is easy to keep
up to date.
- Identify the N most important papers for more focused study and
synthesis (perhaps leading towards more than one survey article).
- Provide metadata on the complete set of papers so that it can be
described statistically.
Simply put, I believe that we as modern researchers need to be able to build survey articles which analyze 2,000-5,000 or more papers, and maybe this is a way to do that.
I and the other members of my team have already planned significant time towards this effort and would be very excited to join forces to lead such a mass collaboration.
Why use Mediawiki rather than Zotero or some other bibliography manager? First, it would be easy for anyone to participate because there is no software to install, no database to import, etc. Second, I personally have found Zotero, CiteULike, and every other bibliography manager I've tried to be clunky and tedious to use and not flexible enough for my needs (for example, three-state tags that let us say a paper has, does not have, or we do not know if it has, a certain property could be useful). We can always export the data into whatever bibliography software is preferred by particular authors.
Authorship is of course an issue, and one that should be worked out before people start contributing IMO, but not an intractable one, and there is precedent for scientific papers to have hundreds of authors (and it would certainly be in the wiki spirit). I myself would love to have a prominent place in the author list, but having the survey article written at all is a much higher priority.
Finally, one of my dreams has been to create a more or less complete database of *all* scientific publications, with reviews, a citation graph, private notes, and a robust data model (e.g., one that can tell two John Smiths apart and know when J. Smith is the same as John Smith). Maybe this is the first step along that path. (I did work a bit on data models for citation databases a bit about five years go and still use the software I created - Yabman, http://yabman.sf.net/.)
Thoughts?
Reid
p.s. Chitu, do you subscribe to this list? If so, we'll stop CC'ing you; if not, I encourage you to do so - it's pretty low traffic and certainly relevant to your work.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l