[Wikidata] Re: History of some original Wikidata design decisions?

26 Jul 2021

Wow :)  Thanks for that, Dan!

On Mon, Jul 26, 2021 at 11:43 AM Dan Brickley &lt;danbri(a)danbri.org&gt; wrote:

...

 On Mon, 26 Jul 2021 at 11:58, Jan Dittrich &lt;jan.dittrich(a)wikimedia.de&gt;
 wrote:

  I would be very interested in Wikidatas Relation
to Cyc
 <https://en.wikipedia.org/wiki/Cyc> on one hand and the semantic Web on
 the other.

 this isn’t written down in one place well, yet

 Here is one strand of history, emphasising from Cyc via Guha’s later work
 on MCF.

 CycL inspired Apple MCF, which got XMLified by Tim Bray when Guha took it
 Netscape. June ‘97 it was submitted to W3C by Netscape. It combined with
 requirements from W3C content labeling work (PICS), where there was
 interest in adding more decentralized expressivity (eg to support Dublin
 Core and other schemas being combined in one “label”), complex structures
 and datatyped property values, aka Signed PICS labels and PICS-NG. While
 PICS and PICS-NG had an s-expression based syntax, RDF (like the 1997
 iteration of MCF) went with XML. At the time XML was being invented by
 stripping SGML down into something that might suit the Web. Microsoft
 submitted XML-Data to W3C mid 97 too (as well as later a revision, breaking
 W3C etiquette). XML-Data shared some goals with RDF but not its graph data
 model. RDF and other usecases led to XML Namespaces being an important
 thing. As XML popularity grew, RDF was under pressure since it didn’t
 engage much with the SGML heritage. The RDFS WG launched just after the RDF
 Model + Syntax spec was announced at Dublin Core’s conference in Finland.
 This being the “browser wars” era both RDF and RDFS were under huge
 pressure to be completed quickly. RDFS included a small subset of the
 schema-defining machinery from MCF. The RDF M+S WG produced an RDF
 recommendation in Feb 1999 but RDFS was left in limbo, in part because the
 XML community were wary of being forced to build XML Schema on top of it.
 Meanwhile from 1998 a small but enthusiastic community started to build
 around RDF - experimenting with query languages, databases, integration
 with inference engines, APIs etc., alongside continued support from
 Netscape who used the technology heavily for everything from RSS feeds,
 sitemaps, “whats related” annotation services, open data (dmoz) dumps, to
 their own browser’s internal data source APIs (xul templates, bookmarks,
 mail, ..). On the standards track, W3C management backed off from RDF work
 to reflect the concerns of its membership, who tended to much prefer XML.
 Meanwhile the US military research agency DARPA had been persuaded by an
 academic turned staffer (Jim Hendler) who had worked on similar early
 technology (SHOE, PIQ) that they should fund research to standardize a
 DARPA Agent Markup Language. A DAML / W3C collaboration led to the
 RDF-oriented W3C team at MIT receiving DARPA funding to continue the work
 area that had not engaged the XML-centric interest of W3C’s membership (ie
 Advisory Committee). Alongside this, RDF/S had engaged the interests of
 European researchers working around logic-based KR languages, eg f-logic,
 description logics etc., resulting in DAML (US) and OIL (description logic
 EU research project outcomes) collaborating via adhoc transatlantic
 committee to produce DAML+OIL, a first draft of a more complicated language
 that sat on top of RDF. The W3C MIT DARPA funding supported a “Semantic Web
 Advanced Development” activity that operated in the grey around of W3C’s
 “non member-funded activity”, and which served in particular to bring
 DAML+OIL into W3C as new work item. This next phase of RDF work at W3C was
 broadly in line with the RDF roadmap and expectations from the 1997
 Metadata Activity, but rebranded “Semantic Web” to reflect several
 considerations. Firstly that RDF was clearly more powerful and expressive
 than a simple metadata format might need. Secondly, by this point RDF was
 pretty unpopular in several contexts - and seen as draining staff resources
 and attention from W3C membership priorities (XML, Web Services, etc.).
 Renaming from RDF allowed a fresh start. Calling it Semantic Web tied into
 Tim-BL’s interest and writing in the area, had more “visionary” feel,
 allowing for a message that it was a longer term investigation, therefore
 not a competitor to XML Schema, SOAP, Xquery and so on. So now we had PICS
 and MCF having mutated into RDF/S for graph data, and then simultaneously a
 rebranding of the exercise as Semantic Web, with a big dose of “futuristic”
 and “researchy”. Conferences and journals and such started to appear,
 initially with much more focus on the “semantics” part, rather than the
 “web”. This was the cause for the second great half-hearted renaming, which
 grew from the growing split between those of us who were in this for
 web-based data sharing, integration, feeds, sitemaps, rss, foaf etc and so
 on, and those who were more “semantics first”, with a passion for finding
 efficient subsets of Description Logic. Around the mid-2000s the earlier
 experimental RDF query languages solidified into SPARQL, which was broadly
 in the “data access” side of the community. This is another place that the
 Cyc and MCF heritage showed up, since most practical RDF systems had a
 notion of source or context attached at the triple of graph level,
 corresponding to the notion of “layers” in MCF (and very loosely with cyc
 contexts). So this kind of takes us to the time when we had rdf/s, owl,
 skos, sparql … and things like dbpedia and the lod cloud were refining the
 data-linking “hypertext rdf” work we’d started in the FOAF project, with a
 TimBL-fueled passion for every entity being given a URI that can serve up
 RDF when dereferenced. A good amount of public open datasets were published
 this way, although applications and usage tended to lag. This brings us to
 the era of rich snippets, Google acquiring Freebase, renaming it Knowledge
 Graph and then stepping back from the role that Wikidata was more effective
 at filling…

 Ok that was a giant biased brain dump, but i think mostly true, and about
 25 years underdocumented history squeezed into a paragraph

 Dan

 Jan

 Am Fr., 23. Juli 2021 um 01:57 Uhr schrieb Denny Vrandečić <
 dvrandecic(a)wikimedia.org&gt;gt;:

  Hi Thad,

 Thanks for asking the questions, and thanks Tobi for the pointers. Man,
 what a lengthy post it was.

 I understand that the post answered most of your questions. I think that
 it is entirely possible to layer a prototype semantics over Wikidata, just
 as the DL semantics have been layered over it. I don't remember if such
 work has been done before.

 Regarding ISO 5964, I think I probably have looked through it at some
 point, but I don't remember it anymore. SKOS has certainly been a stronger
 influence, and obviously OWL.

 I hope that helps with the historical deep dive :) Lydia and I really
 should write that book!

 Cheers,
 Denny

 On Sat, Jul 10, 2021 at 3:00 PM Thad Guidry &lt;thadguidry(a)gmail.com&gt;
 wrote:

  *Tobi - *That blog post 3 is very helpful.  It
shows that Denny and I
 think alike and agree on everything. :-)  His dislike for strong
 classification.
 Which is part of my basis, to allow weak relations much more.  And use
 them.  But how to allow them, and I think the only way is through
 properties based on the Data Model currently.
 There are many ways, and SKOS is one way to allow expressing weak
 relations and we already have some good support with existing properties
 like P4390 mapping relation type
 <https://www.wikidata.org/entity/P4390> and a host of others.

 Denny and I also fear the same things, like not having a flexible
 enough system to describe our complex world that doesn't always fit into
 strict rules.  Which is kinda why I've always liked
 https://www.w3.org/TR/skos-primer/#secassociative
 because of it's non-transitivity which allows much flexibility and as
 he and I would say... avoid "Barbara". :-)
 Which is pretty much summarized in
 https://www.w3.org/TR/skos-primer/#secadvanced

 Sorry for all the SKOS links but semantic relations helps to describe
 human knowledge.  How a system represents or portrays semantic relations is
 where choices are made or have been made.  *And I think the right
 choices were definitely made.*
 Overlaying SKOS and the Wikidata properties that sprinkle it into the
 data model is useful, but I've always been kind of reluctant to do
 that...probably for the same reasons Denny might give?  Choices between
 allowing "semantic accuracy" versus "semantic flexibility".  But I
think
 systems like SKOS provide both.  Perhaps it could be argued that OWL
 provides much less. :-)  Still all KOSs provide great use when they fit
 well.  How they can fit over Wikidata, as I said, is probably only through
 properties at this late stage of design and that's fine with me!

 Still, my main focus is and always will be trying to add human
 knowledge about concept relations into Wikidata to help machines, to help
 us.  (the "edges" that humans quickly can deduce in seconds, but still to
 this day can sometimes take machines days or weeks to figure out).

 My usage and help to Abstract Wikipedia and Wikidata later on will
 primarily be around the mapping of relations ... where a lot of the
 possibilities have already been described years and years ago at the very
 bottom of this long page:
 *inter-KOS mapping relationships  <-- *very last row, 3rd column
 https://www.w3.org/TR/skos-primer/#seccorrespondencesISO

 *Denny - * were you part of or lightly influenced by ISO 5964 through
 Germany ISO DIN or not .. that also would be good to know.

 Thad
 https://www.linkedin.com/in/thadguidry/
 https://calendly.com/thadguidry/

 On Sat, Jul 10, 2021 at 3:17 PM Tobi Gritschacher <
 tobias.gritschacher(a)wikimedia.de&gt; wrote:

> Hi,
>
> It would be nice to have a place to look with a link to a page in the
>> Community portal that says "History of Wikidata's design and early
>> collected meetings, notes, design documents, recordings"
>>
>
> Might not answer your concrete question, but here are some (very)
> early blog posts by Denny. They are still a nice read. :)
>
> 1/3
> https://blog.wikimedia.de/2013/02/22/restricting-the-world/
>
> 2/3
> https://newwwblog.wikimedia.de/2013/06/04/on-truths-and-lies/
>
> 3/3
> https://blog.wikimedia.de/2013/09/12/a-categorical-imperative/
>
> Cheers, Tobi
> _______________________________________________
> Wikidata mailing list -- wikidata(a)lists.wikimedia.org
> To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
>
 _______________________________________________
 Wikidata mailing list -- wikidata(a)lists.wikimedia.org
 To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
  _______________________________________________
 Wikidata mailing list -- wikidata(a)lists.wikimedia.org
 To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

 --
 Jan Dittrich
 UX Design/ Research

 Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin

<https://www.google.com/maps/search/Tempelhofer+Ufer+23-24+%7C+10963+Berlin?entry=gmail&source=g>
 Tel. (030) 219 158 26-0
 https://wikimedia.de

 Unsere Vision ist eine Welt, in der alle Menschen am Wissen der
 Menschheit teilhaben, es nutzen und mehren können. Helfen Sie uns dabei!
 https://spenden.wikimedia.de

 Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
 der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
 Körperschaften I Berlin, Steuernummer 27/029/42207.
 _______________________________________________
 Wikidata mailing list -- wikidata(a)lists.wikimedia.org
 To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
  _______________________________________________
 Wikidata mailing list -- wikidata(a)lists.wikimedia.org
 To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org

-- 
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Re: History of some original Wikidata design decisions?