[Wikidata] Entity tagging and fact extraction (from a scholarly publisher perspective)

2 Nov 2016

I'm at the Crossref LIVE 16 event
<https://www.eventbrite.com/e/crossref-live16-registration-25928526922> in
London where I just gave a presentation
<https://dx.doi.org/10.6084/m9.figshare.4175343.v2> on WikiCite and
Wikidata targeted at scholarly publishers.

Beside Crossref and Datacite people, I talked to a bunch of folks
interested in collaborating on Wikidata integration, particularly from
PLOS, Hindawi and Springer Nature. I started an interesting discussion with
Andrew Smeall, who runs strategic projects at Hindawi, and I wanted to open
it up to everyone on the lists.

Andrew asked me if – aside from efforts like ContentMine and StrepHit –
there are any recommendations for publishers (especially OA publishers) to
mark up their contents and facilitate information extraction and entity
matching or even push triples to Wikidata to be considered for ingestion.

I don't think we have a recommended workflow for data providers for
facilitating triple suggestions to Wikidata, other than leveraging the Primary
Sources Tool <https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool>.
However, aligning keywords and terms with the corresponding Wikidata items
via ID mapping sounds like a good first step. I pointed Andrew to
Mix'n'Match <https://meta.wikimedia.org/wiki/Mix%27n%27match> as a handy
way of mapping identifiers, but if you have other ideas on how to best
support 2-way integration of Wikidata with scholarly contents, please chime
in.

Dario

-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Entity tagging and fact extraction (from a scholarly publisher perspective)