Re: [Wikidata] [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia

8 Sep 2016

      Thanks Ben for reading my mind, I was about to provide the same pointer. :-)
Let's try to keep the discussion on the primary sources tool in one 
place as much as possible.
Cheers,
Marco
On 9/1/16 18:23, Benjamin Good wrote:
...
Dimitris,
This seems like good way to seed a large scale data and reference import
process.  The trouble here is that wikidata already has large amounts of
such potentially useful data (e.g. most of freebase, the results of the
StepHit NLP system, etc.) but the processes for moving it in have thus
far gone slowly.  In fact the author of the StepHit system for mining
facts/references for wikidata is shifting his focus entirely to
improving that part of the pipeline (known currently as the 'primary
sources' tool) as it is the bottleneck.  It would be great to see you
get involved there:
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A...
Once we have a good technical and social pattern for verifying predicted
claims and references at scale, we can get to the business of loading
that system up with good input.
my two cents..
-ben
On Thu, Sep 1, 2016 at 7:53 AM, Dimitris Kontokostas <jimkont@gmail.com
mailto:jimkont@gmail.com> wrote:
Hmm,it is hard to interpret no feedback at all here, it could be
a) the data is not usable for Wikidata
b) this is not an interesting idea for Wikidata (now) or
c) this is not a good place to ask

Based on the very high activity on this list I could only guess (b),
even though but this idea came from the Wikidata community 1+ year
ago. This is probably not relevant now.
https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html
<https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html>

For reference, this is the prototype extractor that generated the
cited facts which can be run on newer dumps
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/CitedFactsExtractor.scala
<https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/CitedFactsExtractor.scala>

Best,
Dimitris

On Tue, Aug 30, 2016 at 9:16 PM, Dario Taraborelli
<dtaraborelli@wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote:

    cc'ing wikicite-discuss, this is going to be of relevance to
    many people there too.

    On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas
    <jimkont@gmail.com <mailto:jimkont@gmail.com>> wrote:

        You can have a look here.
        http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.tql.bz2
        <http://downloads.dbpedia.org/temporary/citations/enwiki-20160305-citedFacts.tql.bz2>
        it is a quad file that contains DBpedia facts and I replaced
        the context with the citation when the citation is on the
        exact same line with the extracted fact. e.g.

        <http://dbpedia.org/resource/An_American_in_Paris
        <http://dbpedia.org/resource/An_American_in_Paris>>
        <http://dbpedia.org/property/work
        <http://dbpedia.org/property/work>> "An American in
        Paris"@en
        <https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=en
        <https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=en>>
        .

        It is based on a complete English dump from ~April and
        contains roughly 1M cited facts
        This is more like a proof-of-concept and there are many ways
        to improve and make it more usable for Wikidata

        let me know what you think

        On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle
        <wp.brilllyle@gmail.com <mailto:wp.brilllyle@gmail.com>> wrote:

            Yes? I think so. Except I would like to see fuller
            citations extracted / sampled from / to? I don't have
            the technical skill to understand the extraction
            completely but Yes. I think there is very rich data in
            Wikipedia that is very extractable.

                Could this approach be a good candidate reference
                suggestions in Wikidata?
                (This particular one is already a reference but the
                anthem and GDP in the attachment are not for example)

            - Erika
            *
            *
            *Erika Herzog*
            Wikipedia *User:BrillLyle
            <https://en.wikipedia.org/wiki/User:BrillLyle>*

            On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas
            <kontokostas@informatik.uni-leipzig.de
            <mailto:kontokostas@informatik.uni-leipzig.de>> wrote:

                Hi,

                I had this idea for some time now but never got to
                test/write it down.
                DBpedia extracts detailed context information in
                Quads (where possible) on where each triple came
                from, including the line number in the wiki text.
                Although each DBpedia extractor is independent,
                using this context there is a small window for
                combining output from different extractors, such as
                the infobox statements we extract from Wikipedia and
                the very recent citation extractors we announced [1]

                I attach a very small sample from the article about
                Germany where I filter out the related triples and
                order them by the line number they were extracted
                from e.g.

                dbr:Germany dbo:populationTotal
                "82175700"^^xsd:nonNegativeInteger
                 <http://en.wikipedia.org/wiki/Germany?oldid=736355524#
                <http://en.wikipedia.org/wiki/Germany?oldid=736355524#>*absolute-line=66*&template=Infobox_country&property=population_estimate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8>
                .
                <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile
                <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile>>
                dbp:isCitedBy dbr:Germany
                <http://en.wikipedia.org/wiki/Germany?oldid=736355524#
                <http://en.wikipedia.org/wiki/Germany?oldid=736355524#>*absolute-line=66*>
                .

                Looking at the wikipedia article we see:
                |population_estimate = 82,175,700<ref>{{cite
                web|url=https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile|title=Population
                <https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile%7Ctitle=Population>
                at 82.2 million at the end of 2015 – population
                increase due to high immigration|date=26 August
                2016|work=destatis.de <http://destatis.de>}}</ref>

                Could this approach be a good candidate reference
                suggestions in Wikidata?
                (This particular one is already a reference but the
                anthem and GDP in the attachment are not for example)

                There are many things that can be done to improve
                the matching but before getting into details I would
                like to see if this idea is worth exploring more or not

                Cheers,
                Dimitris

                [1] http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg07739.html
                <http://www.mail-archive.com/dbpedia-discussion%40lists.sourceforge.net/msg07739.html>

                --
                Dimitris Kontokostas
                Department of Computer Science, University of
                Leipzig & DBpedia Association
                Projects: http://dbpedia.org,
                http://rdfunit.aksw.org, http://aligned-project.eu
                Homepage: http://aksw.org/DimitrisKontokostas
                <http://aksw.org/DimitrisKontokostas>
                Research Group:
                AKSW/KILT http://aksw.org/Groups/KILT
                <http://aksw.org/Groups/KILT>

                _______________________________________________
                Wikidata mailing list
                Wikidata@lists.wikimedia.org
                <mailto:Wikidata@lists.wikimedia.org>
                https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>

            _______________________________________________
            Wikidata mailing list
            Wikidata@lists.wikimedia.org
            <mailto:Wikidata@lists.wikimedia.org>
            https://lists.wikimedia.org/mailman/listinfo/wikidata
            <https://lists.wikimedia.org/mailman/listinfo/wikidata>

        --
        Kontokostas Dimitris

        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata
        <https://lists.wikimedia.org/mailman/listinfo/wikidata>

    --

    *Dario Taraborelli  *Head of Research, Wikimedia Foundation
    wikimediafoundation.org
    <http://wikimediafoundation.org/> • nitens.org
    <http://nitens.org/> • @readermeter
    <http://twitter.com/readermeter>

    --
    WikiCite 2016 – May 26-26, 2016, Berlin
    Meta: https://meta.wikimedia.org/wiki/WikiCite_2016
    <https://meta.wikimedia.org/wiki/WikiCite_2016>
    Twitter: https://twitter.com/wikicite16
    ---
    You received this message because you are subscribed to the
    Google Groups "wikicite-discuss" group.
    To unsubscribe from this group and stop receiving emails from
    it, send an email to wikicite-discuss+unsubscribe@wikimedia.org
    <mailto:wikicite-discuss+unsubscribe@wikimedia.org>.

--
Kontokostas Dimitris

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia