(semi-)automatic statement references fro Wikidata from DBpedia

List overview All Threads
Download

newer

older

help

Wikidata Query Service demo video

Dimitris Kontokostas

27 Aug 2016 27 Aug '16

1:37 p.m.

Hi, I had this idea for some time now but never got to test/write it down. DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text. Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1] I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g. dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger < http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66* &template=Infobox_country&property=population_estimate&split=1& wikiTextSize=10&plainTextSize=10&valueSize=8> . <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/ PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE 9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany < http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66*> . Looking at the wikipedia article we see: |population_estimate = 82,175,700<ref>{{cite web|url=https://www.destatis. de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf; jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob= publicationFile|title=Population at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work= destatis.de}}</ref> Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not Cheers, Dimitris [1] http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/ msg07739.html -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas Research Group: AKSW/KILT http://aksw.org/Groups/KILT

Attachments:

attachment.htm (text/html — 4.4 KB)
statement-references.txt (text/plain — 3.7 KB)

Show replies by date

Brill Lyle

28 Aug 28 Aug

10:38 p.m.

New subject: (semi-)automatic statement references fro Wikidata from DBpedia

...

Hi, I had this idea for some time now but never got to test/write it down. DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text. Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1] I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g. dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger < http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66* &template=Infobox_country&property=population_ estimate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> . <https://www.destatis.de/DE/PresseService/Presse/Pressemitte ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2 DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524# *absolute-line=66*> . Looking at the wikipedia article we see: |population_estimate = 82,175,700<ref>{{cite web|url= https://www.destatis.de/DE/PresseService/Presse/Pres semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=99 6EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile| title=Population at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work=destatis.de}}</ref> Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not Cheers, Dimitris [1] http://www.mail-archive.com/dbpedia-discussion%40lists. sourceforge.net/msg07739.html -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas Research Group: AKSW/KILT http://aksw.org/Groups/KILT _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Dimitris Kontokostas

30 Aug 30 Aug

6:09 a.m.

New subject: (semi-)automatic statement references fro Wikidata from DBpedia

You can have a look here. http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts… it is a quad file that contains DBpedia facts and I replaced the context with the citation when the citation is on the exact same line with the extracted fact. e.g. <http://dbpedia.org/resource/An_American_in_Paris> < http://dbpedia.org/property/work> "An American in Paris"@en < https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=… . It is based on a complete English dump from ~April and contains roughly 1M cited facts This is more like a proof-of-concept and there are many ways to improve and make it more usable for Wikidata let me know what you think On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle(a)gmail.com> wrote:

...

Hi, I had this idea for some time now but never got to test/write it down. DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text. Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1] I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g. dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger < http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66* &template=Infobox_country&property=population_est imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> . <https://www.destatis.de/DE/PresseService/Presse/Pressemitte ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8 D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524# *absolute-line=66*> . Looking at the wikipedia article we see: |population_estimate = 82,175,700<ref>{{cite web|url= https://www.destatis.de/DE/PresseService/Presse/Pres semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996E C2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|title=Population at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work=destatis.de}}</ref> Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not Cheers, Dimitris [1] http://www.mail-archive.com/dbpedia-discussion%40lists.s ourceforge.net/msg07739.html -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas Research Group: AKSW/KILT http://aksw.org/Groups/KILT _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Kontokostas Dimitris

Dario Taraborelli

6:16 p.m.

New subject: (semi-)automatic statement references fro Wikidata from DBpedia

cc'ing wikicite-discuss, this is going to be of relevance to many people there too. On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas <jimkont(a)gmail.com> wrote:

...

You can have a look here. http://downloads.dbpedia.org/temporary/citations/enwiki- 20160305-citedFacts.tql.bz2 it is a quad file that contains DBpedia facts and I replaced the context with the citation when the citation is on the exact same line with the extracted fact. e.g. <http://dbpedia.org/resource/An_American_in_Paris> < http://dbpedia.org/property/work> "An American in Paris"@en < https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=… . It is based on a complete English dump from ~April and contains roughly 1M cited facts This is more like a proof-of-concept and there are many ways to improve and make it more usable for Wikidata let me know what you think On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle(a)gmail.com> wrote:

Hi, I had this idea for some time now but never got to test/write it down. DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text. Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1] I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g. dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger < http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66* &template=Infobox_country&property=population_est imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> . <https://www.destatis.de/DE/PresseService/Presse/Pressemitte ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8 D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524# *absolute-line=66*> . Looking at the wikipedia article we see: |population_estimate = 82,175,700<ref>{{cite web|url= https://www.destatis.de/DE/PresseService/Presse/Pres semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996E C2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|tit le=Population at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work=destatis.de}}</ref> Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not Cheers, Dimitris [1] http://www.mail-archive.com/dbpedia-discussion%40lists.s ourceforge.net/msg07739.html -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas Research Group: AKSW/KILT http://aksw.org/Groups/KILT _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

Dimitris Kontokostas

1 Sep 1 Sep

2:53 p.m.

New subject: [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia

Hmm,it is hard to interpret no feedback at all here, it could be a) the data is not usable for Wikidata b) this is not an interesting idea for Wikidata (now) or c) this is not a good place to ask Based on the very high activity on this list I could only guess (b), even though but this idea came from the Wikidata community 1+ year ago. This is probably not relevant now. https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html For reference, this is the prototype extractor that generated the cited facts which can be run on newer dumps https://github.com/dbpedia/extraction-framework/blob/master/ core/src/main/scala/org/dbpedia/extraction/mappings/CitedFac tsExtractor.scala Best, Dimitris On Tue, Aug 30, 2016 at 9:16 PM, Dario Taraborelli < dtaraborelli(a)wikimedia.org> wrote:

...

cc'ing wikicite-discuss, this is going to be of relevance to many people there too. On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas <jimkont(a)gmail.com> wrote:

You can have a look here. http://downloads.dbpedia.org/temporary/citations/enwiki-2016 0305-citedFacts.tql.bz2 it is a quad file that contains DBpedia facts and I replaced the context with the citation when the citation is on the exact same line with the extracted fact. e.g. <http://dbpedia.org/resource/An_American_in_Paris> < http://dbpedia.org/property/work> "An American in Paris"@en < https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=… . It is based on a complete English dump from ~April and contains roughly 1M cited facts This is more like a proof-of-concept and there are many ways to improve and make it more usable for Wikidata let me know what you think On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle(a)gmail.com> wrote:

Hi, I had this idea for some time now but never got to test/write it down. DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text. Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1] I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g. dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger < http://en.wikipedia.org/wiki/Germany?oldid=736355524#*absolute-line=66* &template=Infobox_country&property=population_est imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> . <https://www.destatis.de/DE/PresseService/Presse/Pressemitte ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8 D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524# *absolute-line=66*> . Looking at the wikipedia article we see: |population_estimate = 82,175,700<ref>{{cite web|url= https://www.destatis.de/DE/PresseService/Presse/Pres semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996E C2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|tit le=Population at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work=destatis.de}}</ref> Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not Cheers, Dimitris [1] http://www.mail-archive.com/dbpedia-discussion%40lists.s ourceforge.net/msg07739.html -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas Research Group: AKSW/KILT http://aksw.org/Groups/KILT _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter> -- WikiCite 2016 – May 26-26, 2016, Berlin Meta: https://meta.wikimedia.org/wiki/WikiCite_2016 Twitter: https://twitter.com/wikicite16 --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe(a)wikimedia.org.

-- Kontokostas Dimitris

Benjamin Good

4:23 p.m.

New subject: [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia

...

cc'ing wikicite-discuss, this is going to be of relevance to many people there too. On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas <jimkont(a)gmail.com

wrote:

You can have a look here. http://downloads.dbpedia.org/temporary/citations/enwiki-2016 0305-citedFacts.tql.bz2 it is a quad file that contains DBpedia facts and I replaced the context with the citation when the citation is on the exact same line with the extracted fact. e.g. <http://dbpedia.org/resource/An_American_in_Paris> < http://dbpedia.org/property/work> "An American in Paris"@en < https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=… . It is based on a complete English dump from ~April and contains roughly 1M cited facts This is more like a proof-of-concept and there are many ways to improve and make it more usable for Wikidata let me know what you think On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle(a)gmail.com> wrote:

Yes? I think so. Except I would like to see fuller citations extracted / sampled from / to? I don't have the technical skill to understand the extraction completely but Yes. I think there is very rich data in Wikipedia that is very extractable. Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) - Erika *Erika Herzog* Wikipedia *User:BrillLyle <https://en.wikipedia.org/wiki/User:BrillLyle>* On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas < kontokostas(a)informatik.uni-leipzig.de> wrote: > Hi, > > I had this idea for some time now but never got to test/write it down. > DBpedia extracts detailed context information in Quads (where > possible) on where each triple came from, including the line number in the > wiki text. > Although each DBpedia extractor is independent, using this context > there is a small window for combining output from different extractors, > such as the infobox statements we extract from Wikipedia and the very > recent citation extractors we announced [1] > > I attach a very small sample from the article about Germany where I > filter out the related triples and order them by the line number they were > extracted from e.g. > > dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger < > http://en.wikipedia.org/wiki/Germany?oldid=736355524# > *absolute-line=66*&template=Infobox_country&property=population_est > imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> . > <https://www.destatis.de/DE/PresseService/Presse/Pressemitte > ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8 > D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy > dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524# > *absolute-line=66*> . > > Looking at the wikipedia article we see: > |population_estimate = 82,175,700<ref>{{cite web|url= > https://www.destatis.de/DE/PresseService/Presse/Pres > semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996E > C2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|tit > le=Population at 82.2 million at the end of 2015 – population > increase due to high immigration|date=26 August 2016|work=destatis.de > }}</ref> > > Could this approach be a good candidate reference suggestions in > Wikidata? > (This particular one is already a reference but the anthem and GDP in > the attachment are not for example) > > There are many things that can be done to improve the matching but > before getting into details I would like to see if this idea is worth > exploring more or not > > Cheers, > Dimitris > > [1] http://www.mail-archive.com/dbpedia-discussion%40lists.s > ourceforge.net/msg07739.html > > -- > Dimitris Kontokostas > Department of Computer Science, University of Leipzig & DBpedia > Association > Projects: http://dbpedia.org, http://rdfunit.aksw.org, > http://aligned-project.eu > Homepage: http://aksw.org/DimitrisKontokostas > Research Group: AKSW/KILT http://aksw.org/Groups/KILT > > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > > _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Marco Fossati

8 Sep 8 Sep

11:57 a.m.

New subject: [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia

Thanks Ben for reading my mind, I was about to provide the same pointer. :-) Let's try to keep the discussion on the primary sources tool in one place as much as possible. Cheers, Marco On 9/1/16 18:23, Benjamin Good wrote:

...

Dimitris, This seems like good way to seed a large scale data and reference import process. The trouble here is that wikidata already has large amounts of such potentially useful data (e.g. most of freebase, the results of the StepHit NLP system, etc.) but the processes for moving it in have thus far gone slowly. In fact the author of the StepHit system for mining facts/references for wikidata is shifting his focus entirely to improving that part of the pipeline (known currently as the 'primary sources' tool) as it is the bottleneck. It would be great to see you get involved there: https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_… Once we have a good technical and social pattern for verifying predicted claims and references at scale, we can get to the business of loading that system up with good input. my two cents.. -ben On Thu, Sep 1, 2016 at 7:53 AM, Dimitris Kontokostas <jimkont(a)gmail.com <mailto:jimkont@gmail.com>> wrote: Hmm,it is hard to interpret no feedback at all here, it could be a) the data is not usable for Wikidata b) this is not an interesting idea for Wikidata (now) or c) this is not a good place to ask Based on the very high activity on this list I could only guess (b), even though but this idea came from the Wikidata community 1+ year ago. This is probably not relevant now. https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html <https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html> For reference, this is the prototype extractor that generated the cited facts which can be run on newer dumps https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/s… <https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/CitedFactsExtractor.scala> Best, Dimitris On Tue, Aug 30, 2016 at 9:16 PM, Dario Taraborelli <dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote: cc'ing wikicite-discuss, this is going to be of relevance to many people there too. On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas <jimkont(a)gmail.com <mailto:jimkont@gmail.com>> wrote: You can have a look here. http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts… <http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.tql.bz2> it is a quad file that contains DBpedia facts and I replaced the context with the citation when the citation is on the exact same line with the extracted fact. e.g. <http://dbpedia.org/resource/An_American_in_Paris <http://dbpedia.org/resource/An_American_in_Paris>> <http://dbpedia.org/property/work <http://dbpedia.org/property/work>> "An American in Paris"@en <https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=en <https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=en>> . It is based on a complete English dump from ~April and contains roughly 1M cited facts This is more like a proof-of-concept and there are many ways to improve and make it more usable for Wikidata let me know what you think On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle(a)gmail.com <mailto:wp.brilllyle@gmail.com>> wrote: Yes? I think so. Except I would like to see fuller citations extracted / sampled from / to? I don't have the technical skill to understand the extraction completely but Yes. I think there is very rich data in Wikipedia that is very extractable. Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) - Erika * * *Erika Herzog* Wikipedia *User:BrillLyle <https://en.wikipedia.org/wiki/User:BrillLyle>* On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas <kontokostas(a)informatik.uni-leipzig.de <mailto:kontokostas@informatik.uni-leipzig.de>> wrote: Hi, I had this idea for some time now but never got to test/write it down. DBpedia extracts detailed context information in Quads (where possible) on where each triple came from, including the line number in the wiki text. Although each DBpedia extractor is independent, using this context there is a small window for combining output from different extractors, such as the infobox statements we extract from Wikipedia and the very recent citation extractors we announced [1] I attach a very small sample from the article about Germany where I filter out the related triples and order them by the line number they were extracted from e.g. dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger <http://en.wikipedia.org/wiki/Germany?oldid=736355524# <http://en.wikipedia.org/wiki/Germany?oldid=736355524#>*absolute-line=66*&template=Infobox_country&property=population_estimate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> . <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile>> dbp:isCitedBy dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524# <http://en.wikipedia.org/wiki/Germany?oldid=736355524#>*absolute-line=66*> . Looking at the wikipedia article we see: |population_estimate = 82,175,700<ref>{{cite web|url=https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|title=Population <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile%7Ctitle=Population> at 82.2 million at the end of 2015 – population increase due to high immigration|date=26 August 2016|work=destatis.de <http://destatis.de>}}</ref> Could this approach be a good candidate reference suggestions in Wikidata? (This particular one is already a reference but the anthem and GDP in the attachment are not for example) There are many things that can be done to improve the matching but before getting into details I would like to see if this idea is worth exploring more or not Cheers, Dimitris [1] http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg0… <http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg07739.html> -- Dimitris Kontokostas Department of Computer Science, University of Leipzig & DBpedia Association Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu Homepage: http://aksw.org/DimitrisKontokostas <http://aksw.org/DimitrisKontokostas> Research Group: AKSW/KILT http://aksw.org/Groups/KILT <http://aksw.org/Groups/KILT> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> -- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> -- *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter> -- WikiCite 2016 – May 26-26, 2016, Berlin Meta: https://meta.wikimedia.org/wiki/WikiCite_2016 <https://meta.wikimedia.org/wiki/WikiCite_2016> Twitter: https://twitter.com/wikicite16 --- You received this message because you are subscribed to the Google Groups "wikicite-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to wikicite-discuss+unsubscribe(a)wikimedia.org <mailto:wikicite-discuss+unsubscribe@wikimedia.org>. -- Kontokostas Dimitris _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

2804

days inactive

2816

days old

wikidata@lists.wikimedia.org

Manage subscription

6 comments

6 participants

tags (0)

participants (6)

Benjamin Good
Brill Lyle
Dario Taraborelli
Dimitris Kontokostas
Dimitris Kontokostas
Marco Fossati