Re: [Wikidata] [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia

1 Sep 2016


      Dimitris,
This seems like good way to seed a large scale data and reference import
process.  The trouble here is that wikidata already has large amounts of
such potentially useful data (e.g. most of freebase, the results of the
StepHit NLP system, etc.) but the processes for moving it in have thus far
gone slowly.  In fact the author of the StepHit system for mining
facts/references for wikidata is shifting his focus entirely to improving
that part of the pipeline (known currently as the 'primary sources' tool)
as it is the bottleneck.  It would be great to see you get involved there:
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A...
Once we have a good technical and social pattern for verifying predicted
claims and references at scale, we can get to the business of loading that
system up with good input.
my two cents..
-ben
On Thu, Sep 1, 2016 at 7:53 AM, Dimitris Kontokostas jimkont@gmail.com
wrote:
...
Hmm,it is hard to interpret no feedback at all here, it could be
a) the data is not usable for Wikidata
b) this is not an interesting idea for Wikidata (now) or
c) this is not a good place to ask
Based on the very high activity on this list I could only guess (b), even
though but this idea came from the Wikidata community 1+ year ago. This is
probably not relevant now.
https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html
For reference, this is the prototype extractor that generated the cited
facts which can be run on newer dumps
https://github.com/dbpedia/extraction-framework/blob/master/
core/src/main/scala/org/dbpedia/extraction/mappings/CitedFac
tsExtractor.scala
Best,
Dimitris
On Tue, Aug 30, 2016 at 9:16 PM, Dario Taraborelli <
dtaraborelli@wikimedia.org> wrote:
...
cc'ing wikicite-discuss, this is going to be of relevance to many people
there too.
On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas <jimkont@gmail.com
...
wrote:
...
You can have a look here.
http://downloads.dbpedia.org/temporary/citations/enwiki-2016
0305-citedFacts.tql.bz2
it is a quad file that contains DBpedia facts and I replaced the context
with the citation when the citation is on the exact same line with the
extracted fact. e.g.
http://dbpedia.org/resource/An_American_in_Paris <
http://dbpedia.org/property/work%3E "An American in Paris"@en <
https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=e... .
It is based on a complete English dump from ~April and contains roughly
1M cited facts
This is more like a proof-of-concept and there are many ways to improve
and make it more usable for Wikidata
let me know what you think
On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle wp.brilllyle@gmail.com
wrote:
...
Yes? I think so. Except I would like to see fuller citations extracted
/ sampled from / to? I don't have the technical skill to understand the
extraction completely but Yes. I think there is very rich data in Wikipedia
that is very extractable.
Could this approach be a good candidate reference suggestions in
Wikidata?
(This particular one is already a reference but the anthem and GDP in
the attachment are not for example)

Erika

*Erika Herzog*
Wikipedia *User:BrillLyle
https://en.wikipedia.org/wiki/User:BrillLyle*
On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas <
kontokostas@informatik.uni-leipzig.de> wrote:
...
Hi,
I had this idea for some time now but never got to test/write it down.
DBpedia extracts detailed context information in Quads (where
possible) on where each triple came from, including the line number in the
wiki text.
Although each DBpedia extractor is independent, using this context
there is a small window for combining output from different extractors,
such as the infobox statements we extract from Wikipedia and the very
recent citation extractors we announced [1]
I attach a very small sample from the article about Germany where I
filter out the related triples and order them by the line number they were
extracted from e.g.
dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger  <
http://en.wikipedia.org/wiki/Germany?oldid=736355524#
*absolute-line=66*&template=Infobox_country&property=population_est
imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> .
https://www.destatis.de/DE/PresseService/Presse/Pressemitte
ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8
D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile dbp:isCitedBy
dbr:Germany http://en.wikipedia.org/wiki/Germany?oldid=736355524#
*absolute-line=66* .
Looking at the wikipedia article we see:
|population_estimate = 82,175,700<ref>{{cite web|url=
https://www.destatis.de/DE/PresseService/Presse/Pres
semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996E
C2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|tit
le=Population at 82.2 million at the end of 2015 – population
increase due to high immigration|date=26 August 2016|work=destatis.de
}}</ref>
Could this approach be a good candidate reference suggestions in
Wikidata?
(This particular one is already a reference but the anthem and GDP in
the attachment are not for example)
There are many things that can be done to improve the matching but
before getting into details I would like to see if this idea is worth
exploring more or not
Cheers,
Dimitris
[1] http://www.mail-archive.com/dbpedia-discussion%40lists.s
ourceforge.net/msg07739.html
--
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia
Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Kontokostas Dimitris

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
http://twitter.com/readermeter
--
WikiCite 2016 – May 26-26, 2016, Berlin
Meta: https://meta.wikimedia.org/wiki/WikiCite_2016
Twitter: https://twitter.com/wikicite16

You received this message because you are subscribed to the Google Groups
"wikicite-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to wikicite-discuss+unsubscribe@wikimedia.org.
--
Kontokostas Dimitris

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia