On Tue, Jul 20, 2010 at 5:10 AM, Daniel Kinzler <daniel@brightbyte.de> wrote:
Hi all

A central place for managing Bibliographic data for use with Citations is
something that has been discussed by the German community for a long time. To
me, it consists of two parts: a project for managing the structured data, and a
machanism for uzsing that data on the wikis.

I have been working on the latter recently, and there's a working prototype: on
 <http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion> you
can see how data records can be included from external sources. A demo for the
actual on-wiki use can be found at
<http://prototype.wikimedia.org/wmde-sandbox-1/Ameisenigel#Literatur>, where
{{ISBN|0868400467}} is used to show the bibliographic info for that book. (side
note: the prototype wikis are slow. sorry about that).

Fetching and showing the data is done using
<http://www.mediawiki.org/wiki/Extension:DataTransclusion>. Care has been taken
to make this secure and scalable.

For a first demo, I'm using teh ISBN as the key, but any kind of key could be
used to reference resources other than books.

For demoing managing the data by ourselves, I have set up ab SMW instance. An
example bib record is at
<http://prototype.wikimedia.org/wmde-bib/ISBN:0451526538>, it's used across
wikis at
<http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion>. Note
that changes will show delayed, as the data is cached for a while.


When discussing these things, please keep in mind that there are two components:
fetching and displaying external data records, and managing structured data in a
wiki style. The former is much simpler than the latter. I think we should really
aim at getting both, but we can start off with transclusing external data much
faster, if we allow no-so-wiki data sources. For ISBN-based queries, we could
simply fetch information from http://openlibrary.org - or the open knowledge
foundation's http://bibliographica.org, once it's working.

In the context of bibdex, I recommend to also have a look at
http://bibsonomy.org - it's a university research project, open source, and is
quite similar to bibdex (and to what citeulike used to be).

As to managing structured data ourselves: I have talked a lot with Erik Möller
and Markus Krötzsch about this, and I'm in touch with the people wo make DBpedia
and OntoWiki. Everyone wants this. But it's not simple at all to get it right
(efficient versioning of multilingual data in a document oriented database,
anyone? want inference? reasoning, even? yay...). So the plan is currently to
hatch a concrete plan for this. And I imagine that bibliographical and
biographical info will be among the first used cases.


Hi Daniel, 

Have you considered that Lucene is the perfect backend for this kind of project? What kinds of faults do you see with it? At least in my mind, we can mold it to our needs here. It has the core capabilities found in Semantic MediaWiki, and it is fast and scalable.

I say this as a serious user of Semantic MediaWiki. I have seen that it can't scale well without an alternate backend, and I wonder what kind of monumental effort will be required to make it scale to tens or hundreds of millions of documents, each of which containing 20-50 properties. Lucene can already do this, SMW, not so much ;-)

Brian

 
cheers,
daniel


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l