Hi David,
Peter brings up some very valid points and I agree with him. I don't
really like how you present this as a done deal to the community. Now it
looks like you have some software performance problem, you think you
found some solution and without any community consultation, you're
pushing this through.
Maarten
On 17-04-20 16:11, David Causse wrote:
Thanks for the feedback,
just a note to say that I responded via
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_S…
David Causse
On Thu, Apr 16, 2020 at 8:16 PM Peter F. Patel-Schneider
<pfpschneider(a)gmail.com <mailto:pfpschneider@gmail.com>> wrote:
I am taking the liberty of replying to the list because of the
problems with
supplied justification for this change that are part of the
original message.
I believe that
https://phabricator.wikimedia.org/T244341#5889997
is inadequate
for determining that blank nodes are problematic. First, the fact
that
determining isomorphism in RDF graphs with blank nodes is
non-polynomial is a
red herring. If the blank nodes participate in only one triple then
isomorphism remains easy. Second, the query given to remove a
some-value SNAK
is incorrect in general - it will remove all triples with the
blank node as
object. (Yes, if the blank nodes found are leaves then no extra
triples are
removed.) A simpler DELETE WHERE will have the seemingly-desired
result.
This is not to say that blank nodes do not cause problems.
According to the
semanticss of both RDF and SPARQL blank nodes are anonymous so to
repeatedly
access the same blank node in a graph one has to access the stored
graph using
an interface that exposes the retained identity of blank nodes.
It looks as
if the WDSQ is built on a system that has such an interface. As
the WDQS
already uses user-visible features that are not part of SPARQL,
adding (or
maybe even only utilizing) a non-standard interface that is only used
internally would not be a problem.
One problem when using generated URLs to replace blank nodes is
that these
generated URLs have to be guaranteed stable and unique (not just
stable) for
the lifetime of the query service. Another problem is that yet
another
non-standard function is being introduced, pulling the RDF dump of
Wikidata
yet further from RDF.
So this is a significant change as far as users are concerned that
also has
potential implementation issues. Why not just use an internal
interface that
exposes a retained identity for blank nodes?
Peter F. Patel-Schneider
On 4/16/20 8:34 AM, David Causse wrote:
Hi,
This message is relevant for people writing SPARQL queries and
using the
Wikidata Query Service:
As part of the work of redesigning the WDQS updater[0] we
identified that
blank nodes[1] are problematic[2] and we plan to
deprecate their
usage in
the wikibase RDF model[3]. To ease the
deprecation process we are
introducing the new function wikibase:isSomeValue() that can be
used in
place of isBlank() when it was used to filter
SomeValue[4].
What does this mean for you: nothing will change for now, we are
only
interested to know if you encounter any issues
with the
wikibase:isSomeValue() function when used as a replacement of
the isBlank()
function. More importantly, if you used the
isBlank() function
for other
purposes than identifying SomeValue (unknown
values in the UI),
please let
us know as soon as possible.
The current plan is as follow:
1. Introduce a new wikibase:isSomeValue() function
We are at this step. You can already use wikibase:isSomeValue()
in the Query
Service. Here’s an example query (Humans whose
gender we know we
don't know):
SELECT ?human WHERE {
?human wdt:P21 ?gender
FILTER wikibase:isSomeValue(?gender) .
}
You can also search the wikis[8] to find all the pages where the
function
isBlank is referenced in a SPARQL query.
2. Generate stable labels for blank nodes in the wikibase RDF output
Instead of "autogenerated" blank node labels wikidata will now
provide a
stable label for blank nodes. In other words the
wikibase
triples using
blank nodes such as:
s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576 _:genid2 ;
will become
s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576
_:1668ace9a6860f7b32569c45fe5a5c0d ;
This is not a breaking change.
3. [BREAKING CHANGE] Convert blank nodes to IRIs in the WDQS updater
At this point some WDQS servers will start returning IRIs such
as
http://www.wikidata.org/somevalue/1668ace9a6860f7b32569c45fe5a5c0d (the
exact form of the IRI is still under discussion)
instead of
blank node
literals like t1514691780 auto-generated by
blazegraph. Queries
still using
isBlank() will stop functioning. Tools explicitly
relying on the
presence of
blank nodes (t1514691780) in the query results
will also be
affected.
We don’t have a defined date for this change yet,
but we will
follow the
Wikidata breaking change process (announcing the
change 4 weeks
in advance).
4. [BREAKING CHANGE] Change the RDF model and remove blank nodes
completely
from the RDF dumps
Instead of doing the conversion and blank node removal in the
WDQS updater
we will do it at RDF generation.
This is a breaking change of the somevalue section of the RDF
model[5] and
> the no value owl constraint for properties[6].
We don’t have a defined date for this change yet,
but we will
follow the
Wikidata breaking change process (announcing the
change 4 weeks
in advance).
If you encounter issues using wikibase:isSomeValue() or if you have
questions about the process, feel free to write a comment on the
Phabricator
ticket[3] or the Contact the development team
(query service and
search)
https://www.mediawiki.org/wiki/Wikibase/DataModel#PropertySomeValueSnak
5:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Somevalue
6:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Novalue
7:
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_S…
8:
https://www.wikidata.org/w/index.php?search=all%3Ainsource%3A%2FisBlank+%2A…
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
<mailto:Wikidata-tech@lists.wikimedia.org>
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
<mailto:Wikidata-tech@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech