[Foundation-l] [Wiki-research-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus Brian.Mingus at Colorado.EDU
Tue Jul 20 17:02:51 UTC 2010


On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen <fn at imm.dtu.dk> wrote:

>
>
> Hi Brian and others,
>
> I also think that it would be interesting with some bibliographic support,
> for two-way citation tracking and commenting on articles (for example), but
> I furthermore find that particular in science article we often find data
> that is worth structuring and put in a database or a structured wiki, so
> that we can extract the data for meta-analysis and specialized information
> retrieval. That is what I also do in the Brede Wiki. I use the templates to
> store such data. So if such a system as yours is implemented we should not
> just think of it as a bibliographic database but in more broader terms: A
> data wiki.


Although the technology required to make a WikiCite happen will be
applicable to a more generalized wiki for storing data I think that is too
broad for the current proposal. A WMF analogue to Google Base is an entirely
new beast that has its own requirements. I certainly think it's an
interesting and worthwhile idea, but I don't feel that we are there yet.

As the 'key' (the wiki page title) I use the (lowercase) title of the
> article. That might be more reader friendly - but usually longer. I think
> that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author
> list + year will be unique, so we need some predictable disambig.


I noticed that AcaWiki is using the title, but I am personally not a fan of
it. The motivation for using a key comes from BibTeX. When you cite an entry
in a publication in LaTeX, you type \cite{key}. Also, I think most
bibliographic formats support such a key. The idea is that there is a
universal token that you can type into Google that will lead you to the
right item. The predictable disambig is in the format I sent out (which
likely needs modification for other kinds of sources). The format is
Author1Author2Author3EtAlYYb. Here is a real world example from a pair of
very prolific scientists, Deco & Rolls, who published at least three papers
together in 2005. In our lab we have really come to love these keys - they
are very memorable tokens that you can verbally pass on to other scientists
in the midst of a discussion. Eventually, if they enter the key you have
given them into Google, they will get the right entry at "WikiCite".

DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in
the orbitofrontal cortex.
DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical
mechanism.
DecoRolls05c - Attention, short-term memory, and action selection: a
unifying theory.

I have one field to each author so that I can automatically link authors.


This is accomplished via Semantic Forms, using the arraymap parser function.
You just provide a comma-separated list of authors, and they each get
semantic property definitions and deep linking to all papers published by
that author.

I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how
> publishers regard the copyright for abstracts. Neither I am sure about the
> forward cites. Most commerical publishers hide the cites for unpaid viewing.
> Including cites in CC-by-sa material on a large-scale may infringe
> publishers' copyright. Perhaps it is possible to negotiate with some
> publishers. We need some talk with 'closed access' publishers before we add
> a such data.


Yes, I have added many nice features to WikiPapers that can unfortunately
not make it into the proposed WMF project. Some can, some can't. For
example, adding papers to the wiki is via a one click bookmarklet. First,
you highlight the title of a paper anywhere on the web, be it a webpage,
e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On
my webserver I am running the citation scraping software from Connotea,
CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed
importer. You can choose to use one of those sources, or you can choose to
merge all of the metadata together. It's automatically added to the wiki for
you. Additionally, I have written a bash script that is very adept at
getting the pdfs from journals, so it automatically tries to download the
pdf and upload it to the wiki for you. I have also implemented the ability
to compute the articles that an article cites, and vice versa. With respect
to abstracts these scrapers aren't that great. Abstracts usually come from
PubMed, whose database you can license, but you cannot change their metadata
IIRC.

Ultimately, I think the community will have to take a very careful look at
what data can be added to the wiki and design policies accordingly. On
Wikipedia I believe copyright enforcement has largely been up to the
community, and it takes a long time to converge on appropriate policies.
Needless to say, much of the technologies I described in the last paragraph
would not be found legal on a public wiki.

I am not sure what 'owner' is in your format. Surely you cant have owners in
> Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the
> revision history.


The 'owner' field is a misnomer, but in lieu of mysql support it lets you
know which individuals have that entry in their personal bibliographies.
dateadded is needed due to what at least used to be a bug in Semantic
MediaWiki.

We probably need to check on the final format of the bibliographic template
> to make sure it is easy translatable to the most common bibliographic
> formats: bibtex, refman, Z3988 microformat, pubmed, etc.


I have written extensive amounts of Python interchange code between wiki
template syntax and BibTeX. I chose BibTeX because it is rather standard,
our lab uses it, and it is very similar to template syntax. Also, I use
Bibutils to convert from BibTeX to most popular formats, and vice versa for
mass import of bibliographies:
http://www.scripps.edu/~cdputnam/software/bibutils/

As I understand there are issue with Semantic MediaWiki with respect to
> performance and security that needs to be resolved before a large scale
> deployment within Wikimedia Foundation projects. I heard that Markus
> Krötzsch is going to Oxford to work on core SMW, so there might come some
> changes to SMW in the future. Code audit of SMW lacks.


As I was writing a custom Lucene search engine for WikiPapers I realized
that it is a perfect replacement for Semantic MediaWiki. Lucene has fields,
it supports boolean operators and you can format its output. All that is
needed is to write the Lucene backend (perhaps just modifying MWLucene) and
write a parser function that supports using templates for formatting of the
output of queries. Lucene is extremely fast and can scale to whatever we can
imagine doing. That's my proposed plan.

It not 'necessarily necessary' to make a new Wikimedia project. There has
> been a suggestion (in the meta or strategy wiki) just to use a namespace in
> Wikipedia. You could then have a page called
> http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning


I believe it is necessary. First, the idea is for any mediawiki anywhere
(and any software with appropriate extensions) to be able to cite the same
source. Secondly, the project would be multilingual.

Cheers,

Brian Mingus
Graduate Student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder


More information about the foundation-l mailing list