Hi,
This message is relevant for people writing SPARQL queries and using the
Wikidata Query Service:
As part of the work of redesigning the WDQS updater[0] we identified that
blank nodes[1] are problematic[2] and we plan to deprecate their usage in
the wikibase RDF model[3]. To ease the deprecation process we are
introducing the new function wikibase:isSomeValue() that can be used in
place of isBlank() when it was used to filter SomeValue[4].
What does this mean for you: nothing will change for now, we are only
interested to know if you encounter any issues with the
wikibase:isSomeValue() function when used as a replacement of the isBlank()
function. More importantly, if you used the isBlank() function for other
purposes than identifying SomeValue (unknown values in the UI), please let
us know as soon as possible.
The current plan is as follow:
1. Introduce a new wikibase:isSomeValue() function
We are at this step. You can already use wikibase:isSomeValue() in the
Query Service. Here’s an example query (Humans whose gender we know we
don't know):
SELECT ?human WHERE {
?human wdt:P21 ?gender
FILTER wikibase:isSomeValue(?gender) .
}
You can also search the wikis[8] to find all the pages where the function
isBlank is referenced in a SPARQL query.
2. Generate stable labels for blank nodes in the wikibase RDF output
Instead of "autogenerated" blank node labels wikidata will now provide a
stable label for blank nodes. In other words the wikibase triples using
blank nodes such as:
s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576 _:genid2 ;
will become
s:Q2-6657d0b5-4aa4-b465-12ed-d1b8a04ef658 ps:P576
_:1668ace9a6860f7b32569c45fe5a5c0d ;
This is not a breaking change.
3. [BREAKING CHANGE] Convert blank nodes to IRIs in the WDQS updater
At this point some WDQS servers will start returning IRIs such as
http://www.wikidata.org/somevalue/1668ace9a6860f7b32569c45fe5a5c0d (the
exact form of the IRI is still under discussion) instead of blank node
literals like t1514691780 auto-generated by blazegraph. Queries still using
isBlank() will stop functioning. Tools explicitly relying on the presence
of blank nodes (t1514691780) in the query results will also be affected.
We don’t have a defined date for this change yet, but we will follow the
Wikidata breaking change process (announcing the change 4 weeks in advance).
4. [BREAKING CHANGE] Change the RDF model and remove blank nodes completely
from the RDF dumps
Instead of doing the conversion and blank node removal in the WDQS updater
we will do it at RDF generation.
This is a breaking change of the somevalue section of the RDF model[5] and
the no value owl constraint for properties[6].
We don’t have a defined date for this change yet, but we will follow the
Wikidata breaking change process (announcing the change 4 weeks in advance).
If you encounter issues using wikibase:isSomeValue() or if you have
questions about the process, feel free to write a comment on the
Phabricator ticket[3] or the Contact the development team (query service
and search) wiki page[7].
Thanks!
--
David Causse
0: https://phabricator.wikimedia.org/T244590
1: https://en.wikipedia.org/wiki/Blank_node
2: https://phabricator.wikimedia.org/T244341#5889997
3: https://phabricator.wikimedia.org/T244341
4: https://www.mediawiki.org/wiki/Wikibase/DataModel#PropertySomeValueSnak
5:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Somevalue
6: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Novalue
7:
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_S…
8:
https://www.wikidata.org/w/index.php?search=all%3Ainsource%3A%2FisBlank+%2A…
I recently asked
<https://twitter.com/thadguidry/status/1247635376094707712> an old Freebase
colleague @narphorium to graciously spend some time to write up what was
once a very cool Freebase app that many of us used in Freebase to see
overlap of Freebase Properties (like WD statements) between 1-5 Freebase
Topics (like WD Items) that then ran a query to find other Topics that
matched that overlapping set.
I thought that sharing this knowledge would allow others to get inspired
and learn and possibly build a similar shape tool for Wikidata; if not
already existing. It is just a small README.md in the repo below.
Freebase Sets
https://github.com/narphorium/freebase-sets
Thad
https://www.linkedin.com/in/thadguidry/
Hello all,
COVID19 Dashboard (https://sites.google.com/view/covid19-dashboard/), a
one-stop information/visualization service for COVID19-related topics, is
out now!
The dashboard data is pulled from Wikidata, and displays COVID19's:
- Factbox
- Map
- Cases
- Deaths
- Victims
- Symptoms
- Possible Treatments
- Health Specialties
- Taxonomy
- Images
- Publications
Take a look: https://sites.google.com/view/covid19-dashboard/
Feedback is welcome, thanks!
Regards,
Fariz
Hopefully this is the right mailing list for my topic.
The German Verein für Computergenealogie is the largest genealogical
society in Germany with more than 3,700 members. We are currently
considering whether Wikibase is a suitable system for us. Most
interesting is the use for our *propographical data*.
Prosopographical data can be divided into three classes:
a) well-known and well-studied personalities, typically authors
b) lesser-known but well-studied personalities that can be clearly and
easily identified in historical sources
c) persons whose identifiability in various sources (such as church
records, civil record, city directory) has to be established using
(mostly manual) record linkage
Data from (a) can be found in the GND of the German National Libarary.
For data from class (b) systems such as FactGrid exists. The Verein für
Computergenealogie mostly works with data from class (c). We have a huge
amount of that kind of data, more than 40 million records. Currently it
is stored in several MySQL and MongoDB databases.
This leads me to the crucial question: Is the performance of Wikibase
sufficient for such an amount of data? One record for a person will
typically result in maybe ten statements in Wikibase. Using
QuickStatements or the WDI library I have not been able to insert more
than two or three statements per second. It would take month to import
the data.
Another question is whether the edit history of the entries can be
preserved. For some data set the edit history goes back to 2004.
I hope someone can give me hints on these questions.
Best wishes
Jesper
There are many highly used templates on WP with time-series data about
COVID spread: cases, tests, health outcomes, by region + per day. Each
cell has a source and some context (caveats, multiple slightly conflicting
or time-offset sources, commentary about that data point), and would
benefit from being explicitly versioned in Wikidata.
What's the right way to capture this in Wikidata - currently, and in the
future? EN Wikipedia tends to have one footnote about sourcing per
geography, with occasional footnotes about how some of those sources have
changed over time. I don't know of any of these templates that are drawing
from Wikidata.
SJ