I am taking the liberty of replying to the list because of the problems with
supplied justification for this change that are part of the original message.
I believe that https://phabricator.wikimedia.org/T244341#5889997 is inadequate
for determining that blank nodes are problematic. First, the fact that
determining isomorphism in RDF graphs with blank nodes is non-polynomial is a
red herring. If the blank nodes participate in only one triple then
isomorphism remains easy. Second, the query given to remove a some-value SNAK
is incorrect in general - it will remove all triples with the blank node as
object. (Yes, if the blank nodes found are leaves then no extra triples are
removed.) A simpler DELETE WHERE will have the seemingly-desired result.
This is not to say that blank nodes do not cause problems. According to the
semanticss of both RDF and SPARQL blank nodes are anonymous so to repeatedly
access the same blank node in a graph one has to access the stored graph using
an interface that exposes the retained identity of blank nodes. It looks as
if the WDSQ is built on a system that has such an interface. As the WDQS
already uses user-visible features that are not part of SPARQL, adding (or
maybe even only utilizing) a non-standard interface that is only used
internally would not be a problem.
One problem when using generated URLs to replace blank nodes is that these
generated URLs have to be guaranteed stable and unique (not just stable) for
the lifetime of the query service. Another problem is that yet another
non-standard function is being introduced, pulling the RDF dump of Wikidata
yet further from RDF.
So this is a significant change as far as users are concerned that also has
potential implementation issues. Why not just use an internal interface that
exposes a retained identity for blank nodes?
Peter F. Patel-Schneider
On 4/16/20 8:34 AM, David Causse wrote:
> Hi,
>
> This message is relevant for people writing SPARQL queries and using the
> Wikidata Query Service:
>
> As part of the work of redesigning the WDQS updater[0] we identified that
> blank nodes[1] are problematic[2] and we plan to deprecate their usage in
> the wikibase RDF model[3]. To ease the deprecation process we are
> introducing the new function wikibase:isSomeValue() that can be used in
> place of isBlank() when it was used to filter SomeValue[4].
>
> What does this mean for you: nothing will change for now, we are only
> interested to know if you encounter any issues with the
> wikibase:isSomeValue() function when used as a replacement of the isBlank()
> function. More importantly, if you used the isBlank() function for other
> purposes than identifying SomeValue (unknown values in the UI), please let
> us know as soon as possible.
>
> The current plan is as follow:
>
> 1. Introduce a new wikibase:isSomeValue() function
> We are at this step. You can already use wikibase:isSomeValue() in the Query
> Service. Here’s an example query (Humans whose gender we know we don't know):
> SELECT ?human WHERE {
> ?human wdt:P21 ?gender
> FILTER wikibase:isSomeValue(?gender) .
> }
> You can also search the wikis[8] to find all the pages where the function
> isBlank is referenced in a SPARQL query.
>
> 2. Generate stable labels for blank nodes in the wikibase RDF output
> Instead of "autogenerated" blank node labels wikidata will now provide a
> stable label for blank nodes. In other words the wikibase triples using
> blank nodes such as:
> s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576 _:genid2 ;
> will become
> s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576
> _:1668ace9a6860f7b32569c45fe5a5c0d ;
> This is not a breaking change.
>
> 3. [BREAKING CHANGE] Convert blank nodes to IRIs in the WDQS updater
> At this point some WDQS servers will start returning IRIs such
> as http://www.wikidata.org/somevalue/1668ace9a6860f7b32569c45fe5a5c0d (the
> exact form of the IRI is still under discussion) instead of blank node
> literals like t1514691780 auto-generated by blazegraph. Queries still using
> isBlank() will stop functioning. Tools explicitly relying on the presence of
> blank nodes (t1514691780) in the query results will also be affected.
> We don’t have a defined date for this change yet, but we will follow the
> Wikidata breaking change process (announcing the change 4 weeks in advance).
>
> 4. [BREAKING CHANGE] Change the RDF model and remove blank nodes completely
> from the RDF dumps
> Instead of doing the conversion and blank node removal in the WDQS updater
> we will do it at RDF generation.
> This is a breaking change of the somevalue section of the RDF model[5] and
> the no value owl constraint for properties[6].
> We don’t have a defined date for this change yet, but we will follow the
> Wikidata breaking change process (announcing the change 4 weeks in advance).
>
> If you encounter issues using wikibase:isSomeValue() or if you have
> questions about the process, feel free to write a comment on the Phabricator
> ticket[3] or the Contact the development team (query service and search)
> wiki page[7].
>
> Thanks!
>
> --
> David Causse
>
> 0: https://phabricator.wikimedia.org/T244590
> 1: https://en.wikipedia.org/wiki/Blank_node
> 2: https://phabricator.wikimedia.org/T244341#5889997
> 3: https://phabricator.wikimedia.org/T244341
> 4: https://www.mediawiki.org/wiki/Wikibase/DataModel#PropertySomeValueSnak
> 5: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Somevalue
> 6: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Novalue
> 7: https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search
> 8: https://www.wikidata.org/w/index.php?search=all%3Ainsource%3A%2FisBlank+%2A%5C%28+%2A%5C%3F%2Fi&title=Special:Search&profile=default&fulltext=1
>
>
>
> _______________________________________________
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech