* Data Update for http://wiki.dbpedia.org/ideas/idea/261/dbpedia-citations-reference-challenge/ *

Thanks to your feedback (and especially from the WikiCite community), we managed to fix a few bugs and extended the coverage of the extracted citations.
The new citation dumps come from the upcoming 2016-04 release and provide *14x more citation data*  (from 7.1M triples to 97.5M triples) 

We share the results early for the DBpedia challenge here
http://downloads.dbpedia.org/temporary/citations/

For those still not sure what they can do with our data, here's what we managed to calculate at the airport while travelling, imagine what you can do with more time and a normal desk;)

Did you know that the most cited Wikipedia...

books are about Football, WW2 and British songs?:
 * (4853 articles) SEN Encyclopedia of AFL Footballers: Every AFL/VFL Player Since 1897 -> http://books.google.com/books?vid=ISBN978-1-921496-32-5
 * (3191 articles) Die Ritterkreuzträger: 1939 - 1945 -> http://books.google.com/books?vid=ISBN978-3-938845-17-2
 * (2927 articles) Die Träger des Ritterkreuzes des Eisernen Kreuzes -> http://books.google.com/books?vid=ISBN978-3-7909-0284-6
 * (1958 articles) British Hit Singles & Albums -> http://books.google.com/books?vid=ISBN1-904994-10-5
 * (1694 articles) Das Deutsche Kreuz -> http://books.google.com/books?vid=ISBN978-3-931533-45-8

Scientific articles are about biology & astronomy?:
 * 5210 http://doi.org/10.1073/pnas.242603899
 * 3757 http://doi.org/10.1101/gr.2596504
 * 2449 http://doi.org/10.1038/ng1285
 * 1667 http://doi.org/10.1051/0004-6361:20078357
 * 1445 http://doi.org/10.1007/bf00171763

websites mostly about census?:
 * 51328 http://www.stat.gov.pl/broker/access/prefile/listPreFiles.jspa
 * 21758 http://www.census.gov/geo/www/gazetteer/gazette.html
 * 21741 http://www.census.gov/prod/www/decennial.html
 * 11954 http://www.census.gov/popest/data/cities/totals/2014/SUB-EST2014.html
 * 10680 http://globiz.pyraloidea.org/Pages/Reports/TaxonReport.aspx


Dates (citations with only dates and a reference needed):
 * February 2007, 5463 times
 * October 2010, 5245 times
 * July 2015, 3919 times
 * October 2015, 3916 times
 * August 2015, 3885 times
(comes from http://citation.dbpedia.org/hash/* IRIs)

see the following lists for complete lists
http://downloads.dbpedia.org/temporary/citations/results.same-citations.different-articles-no-hash.count (we count only references from different pages)
http://downloads.dbpedia.org/temporary/citations/results.same-citations.all-articles-no-hash.count (we count all references, even from same page)


the top 10 domains from wikipedia references are:
 * 1561315 books.google.com
 * 1540250 citation.dbpedia.org
 *  836371 doi.org
 *  154664 news.bbc.co.uk
 *  132997 nytimes.com
 *  129410 bbc.co.uk
 *  101807 census.gov
 *  101125 worldcat.org
 *   89082 news.google.com
 *   76503 ncbi.nlm.nih.gov
see a complete list in:
http://downloads.dbpedia.org/temporary/citations/results.domains.count 
http://downloads.dbpedia.org/temporary/citations/results.domains-distinct.count (counts distinct citations)

Articles with the most needed citations are:
 * Football_records_in_Spain (41 citations needed)
 * Ahmed_Belbachir_Haskouri (29 citations needed)
 * Tree_model (24 citations needed)
 * Immigration_to_Chile (21 citations needed)
 * Larry_Ryckman (18 citations needed)
see here for a full list: http://downloads.dbpedia.org/temporary/citations/results.articles-with-citations-neededd.count


We extract data from many templates. Here's the top 10 and a complete list can be found here: 
http://downloads.dbpedia.org/temporary/citations/results.template.count
 * 9348109 Cite_web
 * 2821628 Cite_news
 * 1958270 Cite_book
 * 1294760 Cite_journal
 *  467933 Citation
 *  317309 Citation_needed
 *   46264 Cite_press_release
 *   37315 Cn
 *   36258 Cite_encyclopedia
 *   33754 Cite_episode

We also have some basic statistics for templates with properties and properties alone 
http://downloads.dbpedia.org/temporary/citations/results.template.count
http://downloads.dbpedia.org/temporary/citations/results.template-property.count

Note that the statistics we provide are meant only as a proof of concept and are based on the enwiki-20160305 dump
you can regenerate them using this shell script: http://downloads.dbpedia.org/temporary/citations/generate-basic-citation-stats.bash


Cheers,
Dimitris on behalf of the OC



On Tue, Jun 7, 2016 at 10:51 AM, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> wrote:

In the latest release (2015-10) DBpedia started exploring the citation and reference data from Wikipedia and we were pleasantly surprised by the rich data we managed to extract.


This data holds huge potential, especially for the Wikidata challenge of providing a reference source for every statement. It describes not only a lot of bibliographical data, but also a lot of web pages and many other sources around the web.


The data we extract at the moment is quite raw and can be improved in many different ways. Some of the potential improvements are:


We welcome contributions that improve the existing citation dataset in any way; and we are open to collaboration and helping. Results will be presented at the next DBpedia meeting: 15 September 2016 in Leipzig, co-located with SEMANTiCS 2016. Each participant should submit a short description of his/her contribution by Monday 12 September 2016 and present his/her work at the meeting. Comments, questions can be posted on the DBpedia discussion & developer lists or in our new DBpedia ideas page.

Submissions will be judged by the Organizing Committee and the best two will receive a prize.


Organizing Committee

  • Vladimir Alexiev, Ontotext and DBpedia BG

  • Anastasia Dimou, Ghent University, iMinds

  • Dimitris Kontokostas, KILT/AKSW, DBpedia Association


--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Research Group: AKSW/KILT http://aksw.org/Groups/KILT




--
Kontokostas Dimitris