In the latest release (2015-10) DBpedia started exploring the citation and reference data from Wikipedia and we were pleasantly surprised by the rich data http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_data_en.ttl.bz2 we managed to extract.
-
citation_data_en.ttl.bz2 http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_data_en.ttl.bz2 (sample http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_data_en.ttl.bz2 ) -
citation_links_en.ttl.bz2 http://downloads.dbpedia.org/2015-10/core-i18n/en/citation_links_en.ttl.bz2 (sample http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_citation_links_en.ttl.bz2 )
This data holds huge potential, especially for the Wikidata challenge of providing a reference source for every statement. It describes not only a lot of bibliographical data, but also a lot of web pages and many other sources around the web.
The data we extract at the moment is quite raw and can be improved in many different ways. Some of the potential improvements are:
-
Extend the citation extractor to handle other Wikipedia language editions https://github.com/dbpedia/extraction-framework/issues/451; currently only English Wikipedia is supported. -
Map the data to a relevant Bibliographic ontology https://github.com/dbpedia/mappings-tracker/issues/79 (there are many candidates and, although BIBO got most votes, we are open to other ontologies) -
Map the data to existing Bibliographic LOD (eg TEL has 100M records, Worldcat 300M) or online books (eg Google Books). See the citationIri issue https://github.com/dbpedia/extraction-framework/issues/452. -
Ways to merge / fuse identical citations from multiple articles -
Use the citation data in the Wikidata primary sources tool https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool -
Surprise us with your ideas!
We welcome contributions that improve the existing citation dataset in any way; and we are open to collaboration and helping. Results will be presented at the next DBpedia meeting: 15 September 2016 in Leipzig, co-located with SEMANTiCS 2016. Each participant should submit a short description of his/her contribution by Monday 12 September 2016 and present his/her work at the meeting. Comments, questions can be posted on the DBpedia discussion & developer lists or in our new DBpedia ideas page http://wiki.dbpedia.org/ideas/idea/261/dbpedia-citations-reference-challenge/ .
Submissions will be judged by the Organizing Committee and the best two will receive a prize.
Organizing Committee
-
Vladimir Alexiev, Ontotext and DBpedia BG -
Anastasia Dimou, Ghent University, iMinds - Dimitris Kontokostas, KILT/AKSW, DBpedia Association