Re: [Wikidata] Documentation sprint for Wikidata during the Wikimedia Hackathon

17 Mar 2017

On 3/15/17 12:15 AM, Rick Labs wrote:
...

 Kingsley,

 Wanted to thank you very much for your valuable post! Its a great
 introduction to making the transition from a table/Excel/spreadsheet
 view of data over to, as you say, /"a//
 //collection of RDF statements grouped by statement Predicate"/

 Those of us working on the Company Data project typically come with
 that  table orientation background. Having a "learning path" laid out
 transitioning to the SPARQL world is very helpful.

 I'm very fuzzy on basic "inheritance" here at Wikidata.

 For example Company->Financial Statements->Income Statement for
 2016Q4->total revenue->some number

   * Total revenue needs the /*time period*/ attached to it (here start
     and end dates for the quarter); others need point-in-time
     measurements, e.g. as of 12/31/2016)
   * The total revenue needs to have an associated */currency
     /*attached to it.
   * The Income Statement for 2016Q4 needs to have a specific
     */accounting standard/* attached to it (for example US GAAP 2017,
     IFRS 2016, more at
     https://www.sec.gov/info/edgar/edgartaxonomies.shtml, and more
     outside the U.S.. The accounting standard followed in preparing
     the numbers must be very specific to help with concordance across
     different standards (especially across countries)
   * The company needs to have a "dominate" or "default" /*industry
     code*/ attached to it. WikiData might best go with 56 industries
     classified according to the '''International Standard Industrial
     Classification revision 4 (ISIC Rev. 4)'''. This is the set used
     by the World Input-Output tables http://www.wiod.org/home. They
     take data from all 28 EU countries and 15 other major countries in
     the world and transform it to be comparable using these
     industries. Its the broadest "nearly global" coverage I can find.
     It would be also advisable to accommodate multiple industry
     assignments per entity / establishment, each with the standard and
     year which were followed, applied from a specifically enumerated
     list. For example in North America data will often be available
     according to the most current, and highly granular 2017 NAICS
     system https://www.census.gov/eos/www/naics/ and there are
     concordances between versions see:
     https://www.census.gov/eos/www/naics/concordances/concordances.html
     and https://unstats.un.org/unsd/cr/registry/isic-4.asp. Looking
     towards the future where large amounts of company data are machine
     imported it would be best to preserve the original, most detailed
     industry codes available (such as the 6 digit NACIS code) and
     preserve the standard and year associated with that assigned
     code(s). Given the year and the detail the concordances can later
     be used to machine add different codes as needed. Granular users
     are then accommodated, and people looking to do cross country /
     global analysis (at the 56 industry level) are also accommodated.

 When I look at the above challenge I think of your prescription of how
 to make RDF collections easier to read.

     1. Addition of annotation relations esp., the likes of rdfs:label,
     skos:prefLabel, skos:altLabel, schema:name, foaf:name, rdfs:comment,
     schema:description etc..

     2. Addition (where possible) use of relations such as foaf:depiction,
     schema:image etc..

     Adhering to the above *leads to RDF statement collections that are
     easier**
     **to read*, without the confusing nature of the term "graph"
     getting in the
     way. At the end of the day, RDF is simply an abstract language for
     creating structured data using a variety of notations (RDF-Turtle,
     RDF-NTriples, JSON-LD, RDF-XML etc..). *It isn't a format, but sadly**
     **that's how it is still perceived* by most circa., 2017 (even
     though the
     initial RDF definition snafu on this front occurred around 2000). 

 And I can't help but be intensely curious as to what happened in that
 2000 initial RDF definition snafu?

Creating and perpetuating the misconception that RDF/XML == RDF. That
was compounded by a Layer Cake diagram that actually depicted the
misconception that RDF was built atop XML.

Today folks still get distracted by JSON-LD vs RDF-Turtle vs RDF-XML vs
RDFa vs Microdata notations for constructing RDF Language
sentences/statements. Net effect, unleashing the real power behind a
Semantic Web continues to hit unnecessary hiccups.

-- 
Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
        : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Documentation sprint for Wikidata during the Wikimedia Hackathon