I like this idea.
I see this as a topic map problem, keeping track of provenance and
topics covered. Wikipedia is run very much like a topic map; there are
ways to disambiguate name collisions, and each article is restricted
to one and only one topic, with occasional merge suggestions.
As a side note, consider the addition of available open source
software for "reading" each and every document that is readable (not
all PDFs, for instance, can be read by machine). Doing so, you can
create a searchable index into each document, perform varieties of
studies, e.g. wordgrams, topic models (Latent Dirichlet), and even
clustering, as a background task that provides a larger context for
all reviews.
Decision making on which documents rate highest is complex, possibly
wicked, and perhaps should relate more to particular goals; which
papers contribute how much information to which topics, etc. What
animates that thought is the "evidence profiles" used by Watson in the
Jeopardy competition. Watson was nowhere near competitive until they
created and refined evidence profiles.
Jack
On Wed, Mar 16, 2011 at 9:26 AM, Reid Priedhorsky <reid(a)reidster.net> wrote:
Chitu and others,
I too see great need for a comprehensive survey paper in this field. My
own personal interest is in one that covers wiki research in general,
not just research of Wikipedia; this of course makes the intractable
number of papers even more intractable.
In fact, I am involved with a team of researchers with the same goal as
you, though we are just getting started.
It seems to me that you are in a very difficult position. As others have
noted, the scoping filter you propose is not a good one, but the number
of papers is simply intractable without a very aggressive filter that
excludes 2/3 or more of the known papers. (To further complicate the
issue, I am skeptical of machine filtering period, fearing that any
useful filter would necessarily be complex and difficult to justify in a
writeup.)
However, I believe that there is a solution, and that is to dramatically
increase the team size by doing the analysis wiki style. Rather than a
small team creating the review, do it in public with an open set of
contributors. Specifically, I propose:
1. Create a public Mediawiki instance.
2. Decide on a relatively standardized format of reviewing each paper
(metadata formats, an infobox, how to write reviews of each, etc.)
3. Upload your existing Zotero database into this new wiki (I would be
happy to write a script to do this).
4. Proceed with paper readings, with the goal that every single paper is
looked at by human eyes.
5. Use this content to produce one or more review articles.
The goals of the effort would be threefold.
* Create an annotated bibliography of wiki research that is easy to keep
up to date.
* Identify the N most important papers for more focused study and
synthesis (perhaps leading towards more than one survey article).
* Provide metadata on the complete set of papers so that it can be
described statistically.
Simply put, I believe that we as modern researchers need to be able to
build survey articles which analyze 2,000-5,000 or more papers, and
maybe this is a way to do that.
I and the other members of my team have already planned significant time
towards this effort and would be very excited to join forces to lead
such a mass collaboration.
Why use Mediawiki rather than Zotero or some other bibliography manager?
First, it would be easy for anyone to participate because there is no
software to install, no database to import, etc. Second, I personally
have found Zotero, CiteULike, and every other bibliography manager I've
tried to be clunky and tedious to use and not flexible enough for my
needs (for example, three-state tags that let us say a paper has, does
not have, or we do not know if it has, a certain property could be
useful). We can always export the data into whatever bibliography
software is preferred by particular authors.
Authorship is of course an issue, and one that should be worked out
before people start contributing IMO, but not an intractable one, and
there is precedent for scientific papers to have hundreds of authors
(and it would certainly be in the wiki spirit). I myself would love to
have a prominent place in the author list, but having the survey article
written at all is a much higher priority.
Finally, one of my dreams has been to create a more or less complete
database of *all* scientific publications, with reviews, a citation
graph, private notes, and a robust data model (e.g., one that can tell
two John Smiths apart and know when J. Smith is the same as John Smith).
Maybe this is the first step along that path. (I did work a bit on data
models for citation databases a bit about five years go and still use
the software I created - Yabman,
http://yabman.sf.net/.)
Thoughts?
Reid
p.s. Chitu, do you subscribe to this list? If so, we'll stop CC'ing you;
if not, I encourage you to do so - it's pretty low traffic and certainly
relevant to your work.
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l