On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas <jimkont@gmail.com> wrote:

You can have a look here.
http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.tql.bz2
it is a quad file that contains DBpedia facts and I replaced the context with the citation when the citation is on the exact same line with the extracted fact. e.g.

<http://dbpedia.org/resource/An_American_in_Paris> <http://dbpedia.org/property/work> "An American in Paris"@en <https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=en> .

It is based on a complete English dump from ~April and contains roughly 1M cited facts
This is more like a proof-of-concept and there are many ways to improve and make it more usable for Wikidata

let me know what you think

On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle@gmail.com> wrote:
Yes? I think so. Except I would like to see fuller citations extracted / sampled from / to? I don't have the technical skill to understand the extraction completely but Yes. I think there is very rich data in Wikipedia that is very extractable.

Could this approach be a good candidate reference suggestions in Wikidata?
(This particular one is already a reference but the anthem and GDP in the attachment are not for example)

- Erika

Erika Herzog
Wikipedia User:BrillLyle

On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> wrote:
Hi,

I had this idea for some time now but never got to test/write it down.
DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text.
Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1]

I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g.

dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger <http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=66&template=Infobox_country&property=population_estimate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> .
<https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524#absolute-line=66> .

Looking at the wikipedia article we see:
|population_estimate = 82,175,700<ref>{{cite web|url=https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|title=Population at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work=destatis.de}}</ref>

Could this approach be a good candidate reference suggestions in Wikidata?
(This particular one is already a reference but the anthem and GDP in the attachment are not for example)

There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not

Cheers,
Dimitris

[1] http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg07739.html

--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas

Research Group: AKSW/KILT http://aksw.org/Groups/KILT

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

--
Kontokostas Dimitris

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Dario Taraborelli Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter