0305-citedFacts.tql.bz2
it is a quad file that contains DBpedia facts and I replaced the context
with the citation when the citation is on the exact same line with the
extracted fact. e.g.
<http://dbpedia.org/resource/An_American_in_Paris> <
.
It is based on a complete English dump from ~April and contains roughly
1M cited facts
This is more like a proof-of-concept and there are many ways to improve
and make it more usable for Wikidata
let me know what you think
On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle <wp.brilllyle(a)gmail.com>
wrote:
Yes? I think so. Except I would like to see
fuller citations extracted
/ sampled from / to? I don't have the technical skill to understand the
extraction completely but Yes. I think there is very rich data in Wikipedia
that is very extractable.
Could this approach be a good candidate reference suggestions in
Wikidata?
(This particular one is already a reference but the anthem and GDP in
the attachment are not for example)
- Erika
*Erika Herzog*
Wikipedia *User:BrillLyle
<https://en.wikipedia.org/wiki/User:BrillLyle>*
On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas <
kontokostas(a)informatik.uni-leipzig.de> wrote:
> Hi,
>
> I had this idea for some time now but never got to test/write it down.
> DBpedia extracts detailed context information in Quads (where
> possible) on where each triple came from, including the line number in the
> wiki text.
> Although each DBpedia extractor is independent, using this context
> there is a small window for combining output from different extractors,
> such as the infobox statements we extract from Wikipedia and the very
> recent citation extractors we announced [1]
>
> I attach a very small sample from the article about Germany where I
> filter out the related triples and order them by the line number they were
> extracted from e.g.
>
> dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger <
>
http://en.wikipedia.org/wiki/Germany?oldid=736355524#
> *absolute-line=66*&template=Infobox_country&property=population_est
> imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> .
> <https://www.destatis.de/DE/PresseService/Presse/Pressemitte
> ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8
> D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy
> dbr:Germany <http://en.wikipedia.org/wiki/Germany?oldid=736355524#
> *absolute-line=66*> .
>
> Looking at the wikipedia article we see:
> |population_estimate = 82,175,700<ref>{{cite web|url=
>
https://www.destatis.de/DE/PresseService/Presse/Pres
> semitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996E
> C2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|tit
> le=Population at 82.2 million at the end of 2015 – population
> increase due to high immigration|date=26 August 2016|work=destatis.de
> }}</ref>
>
> Could this approach be a good candidate reference suggestions in
> Wikidata?
> (This particular one is already a reference but the anthem and GDP in
> the attachment are not for example)
>
> There are many things that can be done to improve the matching but
> before getting into details I would like to see if this idea is worth
> exploring more or not
>
> Cheers,
> Dimitris
>
> [1]
http://www.mail-archive.com/dbpedia-discussion%40lists.s
>
ourceforge.net/msg07739.html
>
> --
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig & DBpedia
> Association
> Projects:
http://dbpedia.org,
http://rdfunit.aksw.org,
>
http://aligned-project.eu
> Homepage:
http://aksw.org/DimitrisKontokostas
> Research Group: AKSW/KILT
http://aksw.org/Groups/KILT
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Kontokostas Dimitris
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org