Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)

11 Nov 2016

Benjamin – agreed, I too see Wikidata as mainly a place to hold all the
mappings. Once we support federated queries in WDQS, the benefit of ID
mapping (over extensive data ingestion) will become even more apparent.

Hope Andrew and other interested parties can pick up this thread.

On Wed, Nov 2, 2016 at 12:11 PM, Benjamin Good &lt;ben.mcgee.good(a)gmail.com&gt;
wrote:

...
  Dario,

 One message you can send is that they can and should use existing
 controlled vocabularies and ontologies to construct the metadata they want
 to share.  For example, MeSH descriptors would be a good way for them to
 organize the 'primary topic' assertions for their articles and would make
 it easy to find the corresponding items in Wikidata when uploading.  Our
 group will be continuing to expand coverage of identifiers and concepts
 from vocabularies like that in Wikidata - and any help there from
 publishers would be appreciated!

 My view here is that Wikidata can be a bridge to the terminologies and
 datasets that live outside it - not really a replacement for them.  So, if
 they have good practices about using shared vocabularies already, it should
 (eventually) be relatively easy to move relevant assertions into the
 WIkidata graph while maintaining interoperability and integration with
 external software systems.

 -Ben

 On Wed, Nov 2, 2016 at 8:31 AM, 'Daniel Mietchen' via wikicite-discuss <
 wikicite-discuss(a)wikimedia.org&gt; wrote:

  I'm traveling (
https://twitter.com/EvoMRI/status/793736211009536000
 ), so just in brief:
 In terms of markup, some general comments are in
 https://www.ncbi.nlm.nih.gov/books/NBK159964/ , which is not specific
 to Hindawi but partly applies to them too.

 A problem specific to Hindawi (cf.
 https://commons.wikimedia.org/wiki/Category:Media_from_Hindawi) is the
 bundling of the descriptions of all supplementary files, which
 translates into uploads like
 https://commons.wikimedia.org/wiki/File:Evolution-of-Coronar
 y-Flow-in-an-Experimental-Slow-Flow-Model-in-Swines-
 Angiographic-and-623986.f1.ogv
 (with descriptions for nine files)
 and eight files with no description, e.g.
 https://commons.wikimedia.org/wiki/File:Evolution-of-Coronar
 y-Flow-in-an-Experimental-Slow-Flow-Model-in-Swines-
 Angiographic-and-623986.f2.ogv
 .

 There are other problems in their JATS, and it would be good if they
 would participate in
 http://jats4r.org/ . Happy to dig deeper with Andrew or whoever is
 interested.

 Where they are ahead of the curve is licensing information, so they
 could help us set up workflows to get that info into Wikidata.

 In terms of triple suggestions to Wikidata:
 - as long as article metadata is concerned, I would prefer to
 concentrate on integrating our workflows with the major repositories
 of metadata, to which publishers are already posting. They could help
 us by using more identifiers (e.g. for authors, affiliations, funders
 etc.), potentially even from Wikidata (e.g. for keywords/ P921, for
 both journals and articles) and by contributing to the development of
 tools (e.g. a bot that goes through the CrossRef database every day
 and creates Wikidata items for newly published papers).
 - if they have ways to extract statements from their publication
 corpus, it would be good if they would let us/ ContentMine/ StrepHit
 etc. know, so we could discuss how to move this forward.
 d.

 On Wed, Nov 2, 2016 at 1:42 PM, Dario Taraborelli
 &lt;dtaraborelli(a)wikimedia.org&gt; wrote:
  I'm at the Crossref LIVE 16 event in London
where I just gave a  presentation
  on WikiCite and Wikidata targeted at scholarly
publishers.

 Beside Crossref and Datacite people, I talked to a bunch of folks  interested
  in collaborating on Wikidata integration,
particularly from PLOS,  Hindawi
  and Springer Nature. I started an interesting
discussion with Andrew  Smeall,
  who runs strategic projects at Hindawi, and I
wanted to open it up to
 everyone on the lists.

 Andrew asked me if – aside from efforts like ContentMine and StrepHit –
 there are any recommendations for publishers (especially OA publishers)  to
  mark up their contents and facilitate information
extraction and entity
 matching or even push triples to Wikidata to be considered for  ingestion.

 I don't think we have a recommended workflow for data providers for
 facilitating triple suggestions to Wikidata, other than leveraging the
 Primary Sources Tool. However, aligning keywords and terms with the
 corresponding Wikidata items via ID mapping sounds like a good first  step. I
  pointed Andrew to Mix'n'Match as a handy
way of mapping identifiers,  but if
  you have other ideas on how to best support 2-way
integration of  Wikidata
  with scholarly contents, please chime in.

 Dario

 --

 Dario Taraborelli  Head of Research, Wikimedia Foundation
 wikimediafoundation.org • nitens.org • @readermeter

 --
 WikiCite 2016 – May 26-26, 2016, Berlin
 Meta: https://meta.wikimedia.org/wiki/WikiCite_2016
 Twitter: https://twitter.com/wikicite16
 ---
 You received this message because you are subscribed to the Google  Groups
  "wikicite-discuss" group.
 To unsubscribe from this group and stop receiving emails from it, send  an
  email to
wikicite-discuss+unsubscribe(a)wikimedia.org. 
 --
 WikiCite 2016 – May 26-26, 2016, Berlin
 Meta: https://meta.wikimedia.org/wiki/WikiCite_2016
 Twitter: https://twitter.com/wikicite16
 ---
 You received this message because you are subscribed to the Google Groups
 "wikicite-discuss" group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to wikicite-discuss+unsubscribe(a)wikimedia.org.

-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)