Hi Brian,
On 20 Jul 2010, at 18:02, Brian J Mingus wrote:
On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen
<fn(a)imm.dtu.dk> wrote:
Hi Brian and others,
I also think that it would be interesting with some bibliographic support, for two-way
citation tracking and commenting on articles (for example), but I furthermore find that
particular in science article we often find data that is worth structuring and put in a
database or a structured wiki, so that we can extract the data for meta-analysis and
specialized information retrieval. That is what I also do in the Brede Wiki. I use the
templates to store such data. So if such a system as yours is implemented we should not
just think of it as a bibliographic database but in more broader terms: A data wiki.
Although the technology required to make a WikiCite happen will be applicable to a more
generalized wiki for storing data I think that is too broad for the current proposal. A
WMF analogue to Google Base is an entirely new beast that has its own requirements. I
certainly think it's an interesting and worthwhile idea, but I don't feel that we
are there yet.
As the 'key' (the wiki page title) I use the (lowercase) title of the article.
That might be more reader friendly - but usually longer. I think that
KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be
unique, so we need some predictable disambig.
I noticed that AcaWiki is using the title, but I am personally not a fan of it. The
motivation for using a key comes from BibTeX. When you cite an entry in a publication in
LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key.
The idea is that there is a universal token that you can type into Google that will lead
you to the right item. The predictable disambig is in the format I sent out (which likely
needs modification for other kinds of sources). The format is
Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific
scientists, Deco & Rolls, who published at least three papers together in 2005. In our
lab we have really come to love these keys - they are very memorable tokens that you can
verbally pass on to other scientists in the midst of a discussion. Eventually, if they
enter the key you have given them into Google, they will get the right entry at
"WikiCite".
DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the
orbitofrontal cortex.
DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism.
DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.
Citation keys of this sort work, but they have to be decided on by some external system.
Who decides which paper is -, b, and c? Publication order would be one way to do it -- but
that's complicated, especially with online first publication, or overlapping
conferences.
I think whether they're memorable tokens might vary by person... Sure, the author and
year will be identifiable, even memorable. But the a, b, c?
If you want to support more than recent works, I'd urge YYYY instead of YY. Then we
only have an issue for pre-0 stuff. :)
Also consider differentiating authors from title and year, perhaps with slashes.
author1-author2-author3-etal/YYYY/b
I'm not convinced that -'s are better than capital letters (author last names can
have both)...
I have one field to each author so that I can automatically link authors.
This is accomplished via Semantic Forms, using the arraymap parser function. You just
provide a comma-separated list of authors, and they each get semantic property definitions
and deep linking to all papers published by that author.
Sure -- unless authors have the same name, or use different forms of the name.
One of my coauthors goes by John G. Breslin for disambiguration since his name is common
-- but on the institute website he's credited as John Breslin, since that's the
only name the system recognizes.
In other words, some authority control will be needed. Libraries have a long history with
this. Groups of booklovers do it, too. For instance, here's the LibraryThing page for
John Smith:
http://www.librarything.com/author/smithjohn
Notice that you can split and join authors -- LibraryThing's way of giving users the
ability to join and separate.
Or see
http://www.librarything.com/author/carrolllewis
Sometimes there are difficult questions -- such as "Is Lewis Carroll the same as
Charles Dodgson?" - which depends on what you mean by "same".
For the scope of the potential problem, look at highly published authors -- for instance
the "alternative names" list for Dante:
http://www.worldcat.org/identities/lccn-n78-95495
I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how publishers
regard the copyright for abstracts. Neither I am sure about the forward cites. Most
commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa
material on a large-scale may infringe publishers' copyright. Perhaps it is possible
to negotiate with some publishers. We need some talk with 'closed access'
publishers before we add a such data.
Yes, I have added many nice features to WikiPapers that can unfortunately not make it
into the proposed WMF project. Some can, some can't. For example, adding papers to the
wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on
the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to
wiki" bookmarklet. On my webserver I am running the citation scraping software from
Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer.
You can choose to use one of those sources, or you can choose to merge all of the metadata
together. It's automatically added to the wiki for you. Additionally, I have written a
bash script that is very adept at getting the pdfs from journals, so it automatically
tries to download the pdf and upload it to the wiki for you. I have also implemented the
ability to compute the articles that an article cites, and vice versa. With respect to
abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose
database you can license, but you cannot change their metadata IIRC.
Ultimately, I think the community will have to take a very careful look at what data can
be added to the wiki and design policies accordingly. On Wikipedia I believe copyright
enforcement has largely been up to the community, and it takes a long time to converge on
appropriate policies. Needless to say, much of the technologies I described in the last
paragraph would not be found legal on a public wiki.
I am not sure what 'owner' is in your format. Surely you cant have owners in
Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the
revision history.
The 'owner' field is a misnomer, but in lieu of mysql support it lets you know
which individuals have that entry in their personal bibliographies. dateadded is needed
due to what at least used to be a bug in Semantic MediaWiki.
We probably need to check on the final format of the bibliographic template to make sure
it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988
microformat, pubmed, etc.
I have written extensive amounts of Python interchange code between wiki template syntax
and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very
similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular
formats, and vice versa for mass import of bibliographies:
http://www.scripps.edu/~cdputnam/software/bibutils/
BibTeX is good for backwards compatibility, but I'd urge a richer data format --
probably based on bibo RDF:
http://bibliontology.com/
It's already widely used:
http://bibliontology.com/projects
As I understand there are issue with Semantic MediaWiki with respect to performance and
security that needs to be resolved before a large scale deployment within Wikimedia
Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW,
so there might come some changes to SMW in the future. Code audit of SMW lacks.
As I was writing a custom Lucene search engine for WikiPapers I realized that it is a
perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean
operators and you can format its output. All that is needed is to write the Lucene backend
(perhaps just modifying MWLucene) and write a parser function that supports using
templates for formatting of the output of queries. Lucene is extremely fast and can scale
to whatever we can imagine doing. That's my proposed plan.
It not 'necessarily necessary' to make a new Wikimedia project. There has been a
suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could
then have a page called
http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
I believe it is necessary. First, the idea is for any mediawiki anywhere (and any
software with appropriate extensions) to be able to cite the same source. Secondly, the
project would be multilingual.
I think somebody's mentioned OpenLibrary on this thread. In case not:
http://openlibrary.org/
Its scope is limited to books, but their interests are similar.
-Jodi
Cheers,
Brian Mingus
Graduate Student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l