I'd like to restate the initial question.
Why did wikidata choose shex instead of other approaches?
(thank you Andra!) I could
see arguments in both directions. I'm curious to know what swayed the
wikidata software team as my group is currently grappling with the same
decision.
On Thu, May 30, 2019 at 7:55 AM Peter F. Patel-Schneider <
pfpschneider(a)gmail.com> wrote:
The history of ShEx is quite complex.
I don't think that one can say that there were complete and conforming
implementations of ShEx in 2017 because the main ShEX specification,
http://shex.io/shex-semantics-20170713/ was ill-founded. I pointed this
out
in
https://lists.w3.org/Archives/Public/public-shex/2018Mar/0008.html
There were several quite different semantics proposed for ShEx somewhat
earlier, all with significant problems.
peter
On 5/30/19 12:34 AM, Andra Waagmeester wrote:
I really don't see the issue here. SHACL,
like ShEx is a language to
express
data shapes. I adopted using ShEx in a wikidata
context 2016 when ShEx
was
demonstrated at a tutorial at the SWAT4HCLS
conference [1] in Amsterdam,
where
it was discussed in both a tutorial and a
hackathon topic. At that
conferene,
I was convinced that ShEx is helpful in
maintaining quality in Wikidata.
ShEx
offers not only the means to validate data shapes
in Wikidata, but it
also
provides a way to document how primary data is
expressed in Wikidata.
In 2016
I joined the ShEx community group [2]. Since I
have been actively using
ShEx
in defining shapes in various projects on
Wikidata (e.g. Gene Wiki and
Wikicite). It is not that this happened in secrecy. On the contrary, it
was
discussed at both Wikimedia [3,4] and
non-Wikimedia events [5,6,7].
It is also not the case that SHACL has not been discussed in this
context, on
the contrary, I have very good memories of a
workshop where both were
debated
(see page 24 ;) ) [8]
IMHO the statement that we all should adhere to one standard, simply
because
it is a standard, is not a valid argument.
Imagine having to dictate
that we
all should speak English because it is the
standard language. In every
single
talk that I have given since 2016, proponents of
SHACL have been very
vocal in
asking the same question over and over again
"why not SHACL?", where the
discussion never went beyond, "You should because it is a standard". It
is
also a bit disingenuous to suggest we all should
adhere to SHACL because
it is
the standard, while in the same sentence calling
it a "Recommendation".
Although initially, I was open to SHACL as well (I use both Mac and
Linux, so
why not open up to different alternatives in data
shapes), (Some)
Arguments
for me to prefer ShEx over SHACL are:
1. Already in 2017 there were different (open) implementations. At the
time
SHACL didn't have much tooling to choose
from, other than one javascript
implementation and a proprietary software package.
2. ShEx has a more intuitive way of describing Shapes, which is the
compact
syntax (ShExC). SHACL seems to have adopted the
compact syntax as well,
but
only yesterday [9].
3. The culture in the Shape Expression community group aligns well with
the
culture in Wikidata.
4. I don't want to be shackled to one standard (pun intended). I assume
the
name was chosen with a shackle in mind, which
puts constraints at the
core of
the language. Wikidata already has different
methods in place to deal
with
constraints and constraint violations. In the
context of Wikidata, ShEx
should
specifically not be intended to impose
constraints, on the contrary, it
allows
expressing of disagreement or variants of
different shapes, whether
conflict
or not. Which fits well with the NPOV concept.
Symbols do matter.
For a less personal comparison, I refer to the "Validating RDF data" book
which describes both ShEx and SHACL, and has a specific chapter on how
they
compare and differ [10]
Up until now, I have been using ShEx in repositories outside the Wikidata
ecosystem (e.g. Github), but I am really excited about the release of
this
extension. I am curious about how the wiki
extension will influence the
maintenance of schemas. Schemas are currently often expressed as static
images, while in practice the schemas are as fluid as the underlying data
itself. Being able to document these changes dynamically (the wiki way),
can
be very interesting. One specific expectation I
have is that it might
make it
easier to write federated SPARQL queries.
Currently, when writing these
federated queries we often have to rely on either a set of example
queries or
a one-time schema description, which makes it
hard to write those
queries,
because of schemas changing constantly. Federated
SPARQL queries now
really is
a process of "slot machine" querying,
where one has to explore the
underlying
schema, query by query. With a wiki in place and
a community maintaining
these ever-changing schema's, I expect better documentation.
The data shape community, instead of adhering to one language, should
really
be proud to have produced two very helpful
languages. ShEx and SHACL are
similar but do have differences so both have merit to exist and I wish we
could steer away from this ShEx vs SHACL feud. It really isn't helping
the
cause, i.e. being able to express schemas in a
formal language.
Honestly, this
fued really reminds me of the famous monty python
sketch, "The machine
that
says Bing". Let us focus on the patient and
not on the "Bing".
Just my 2ct.
[1]
http://www.swat4ls.org/workshops/amsterdam2016/
[2]
https://www.w3.org/community/shex/
[3]
https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/Using_S…
https://figshare.com/articles/Using_Shape_Expressions_ShEx_to_model_validat…
[6]
https://2017.semantics.cc/satellite-events/linked-data-quality-assessment-a…
https://upload.wikimedia.org/wikipedia/commons/d/d6/WikiCite_2017_report.pdf
[9]
https://lists.w3.org/Archives/Public/public-shacl/2019May/0012.html
[10]
http://book.validatingrdf.com/
On Wed, May 29, 2019 at 10:05 PM Antoine Zimmermann
<antoine.zimmermann(a)emse.fr <mailto:antoine.zimmermann@emse.fr>> wrote:
Hello,
Could you explain why the non-standard ShEx has been chosen rather
than
the W3C Recommendation SHACL?
I would assume that if one has several options for bringing a
functionality to something that largely promotes interoperability
(like
Wikidata), the default choice should be a
standard, and /only if/ one
has a carefully crafted argumentation to reject it, one would opt for
something else.
For those who may not know, the W3C RDF Data Shapes Working Group
worked
between 2014 and 2017 on defining a standard
for describing data
shapes
in RDF. ShEx existed already and was a
candidate for standardisation.
Eventually, another standard emerged, Shapes Constraint Language
(SHACL,
see
https://www.w3.org/TR/shacl/).
Disclaimer: I did not contribute to either SHACL or ShEx, and I do
not
know them enough to judge which one is
better.
Best,
--AZ
On 19/05/2019 15:32, Léa Lacroix wrote:
> Hello all,
>
> After several months of development and testing together with the
> WikiProject ShEx
> <https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx>, Shape
> Expressions are about to be enabled on Wikidata.
>
> *First of all, what are Shape Expressions?*
>
> ShEx (Q29377880) <https://www.wikidata.org/wiki/Q29377880> is a
concise,
> formal modeling and validation language
for RDF structures. Shape
> Expressions can be used to define shapes within the RDF graph. In
the
> case of Wikidata, this would be sets of
properties, qualifiers and
> references that describe the domain being modeled.
>
> See also:
>
> * a short video about ShEx
> <https://www.youtube.com/watch?v=AR75KhEoRKg> made by
community
> members during the Wikimedia
hackathon 2019
> * introduction to ShEx <http://shex.io/shex-primer/>
> * more details about the language <
http://shex.io/shex-semantics/>
>
> *What can it be used for?*
>
> On Wikidata, the main goal of Shape Expressions would be to
describe
> what the basic structure of an item
would be. For example, for a
human,
> we probably want to have a date of
birth, a place of birth, and
many
> other important statements. But we would
also like to make sure
that if
> a statement with the property “children”
exists, the value(s) of
this
> property should be humans as well.
Schemas will describe in detail
what
> is expected in the structure of items,
statements and values of
these
> statements.
>
> Once Schemas are created for various types of items, it is
possible to
> test some existing items against the
Schema, and highlight possible
> errors or lack of information. Subsets of the Wikidata graph can be
> tested to see whether or not they conform to a specific shape
through
> the use of validation tools. Therefore,
Schemas will be very
useful to
> help the editors improving the data
quality. We imagine this to be
> especially useful for wiki projects to more easily discuss and
ensure
> the modeling of items in their domain.
In the spirit of Wikidata
not
> restricting the world, Shape Expressions
are a tool to highlight,
not
> prevent, errors.
>
> On top of this, one could imagine other uses of Schemas in the
future,
> for example building a tool that would
suggest, when creating a new
> item, what would be the basic structure for this item, and helping
> adding statements or values. A bit like this existing tool, Cradle
> <https://tools.wmflabs.org/wikidata-todo/cradle/#/>, that is
currently
> not based on ShEx.
>
> *What is going to change on Wikidata?*
>
> * A new extension will be added to Wikidata: EntitySchema
> <https://www.mediawiki.org/wiki/Extension:EntitySchema>,
defining
> the Schema namespace and its
behavior as well as special pages
> related to it.
> * A new entity type, EntitySchema, will be enabled to store Shape
> Expressions. Schemas will be identified with the letter E.
> * The Schemas will have multilingual labels, descriptions and
aliases
> (quite similar to the termbox on
Items), and the schema text
one can
> fill with a syntax called ShEx
Compact Syntax (ShExC)
> <http://shex.io/shex-semantics/#shexc>. You can see an
example
here
<https://wikidata-shex.wmflabs.org/wiki/EntitySchema:E2>.
* The external tool shex-simple
<
https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/she…
> is directly linked from the Schema pages in order to check
entities
> of your choice against the schema.
>
> *When is this happening?*
>
> Schemas will be enabled on on
test.wikidata.org <
http://test.wikidata.org>
<http://test.wikidata.org> on May 21st and on
wikidata.org
<http://wikidata.org>
> <http://wikidata.org> on May 28th. After this release, they will
be
> integrated to the regular maintenance
just like the rest of
Wikidata’s
> features.
>
> *How can you help?*
>
> * Before the release, you can try to edit or create Shape
Expressions
> on our test system <
https://wikidata-shex.wmflabs.org/wiki/Main_Page>
> * If you find any issue or feature
you’d like to have, feel free
to
> create a new task on Phabricator
with the tag
|shape-expressions|
> * Once Schemas are enabled, you can
discuss about it on your
favorite
> wikiprojects: for example, what
types of items would you like
to model?
* You
can also get more information about how to create a Schema
<
https://www.wikidata.org/wiki/Wikidata:WikiProject_ShEx/How_to_get_started%…
>
> *See also: *
>
> * Main Phabricator board
> <https://phabricator.wikimedia.org/tag/shape_expressions/>
> * Technical documentation of the extension
> <https://meta.wikimedia.org/wiki/Extension:EntitySchema>
> * To enhance the interface, you can use this user script
> <
https://www.wikidata.org/wiki/User:Zvpunry/EntitySchemaHighlighter.js>
> to highlight items and properties in
the schema code and turn
the
> IDs into links
>
> If you have any questions, feel free to reach me. Cheers,
>
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
>
www.wikimedia.de <http://www.wikimedia.de> <
http://www.wikimedia.de>
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens
e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts
Berlin-Charlottenburg
unter der
Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata