Hello all!
The feedback period for our WDQS Graph Split proposal
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_spli…>
has
come to an end. Many thanks to all people who sent comments, your
contribution is invaluable!
We’ve incorporated most comments and proposals into our final set of rules
for the graph split
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_spli…>.
The main proposals (including some that were rejected) were:
- Duplicate properties in both graph (wd:P*) does not seem necessary and
won't be done
- The list of types of publications that identify what is a scholarly
article have been improved, see the final list of items here
<https://docs.google.com/spreadsheets/d/1eKX_2Z1rXj1s_zOapQvn_0uD6MVhc-qyqqx…>
- It was discussed whether sitelinks should inform the nature of the
split or not; this idea was not incorporated because it might make it
harder to understand what is where
- Discussions and investigations regarding items that define
multiple instance
of (P31) <https://www.wikidata.org/wiki/Property:P31> which might be
ambiguous, it appears that it might not affect a lot of items and that the
solution might be to disambiguate these instances by creating separate
entities (see the Clinical Trials section
<https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_query_service/WDQS_graph…>
of
the Talk Page).
- Re-thinking how scholarly articles are modelled was raised, especially
by identifying the nature of the publication using a separate property
rather than using instance of (P31)
<https://www.wikidata.org/wiki/Property:P31>. This idea should probably
be explored and discussed by the wikicite community, since it does affect
the nature of the split but could be a nice criteria to take into
consideration in the future.
We are now working on implementing the appropriate tooling to manage this
split, including a new way of processing the Wikidata dumps for an initial
load, modification to the update pipeline to support the graph split, and
additional automation. We hope to have new SPARQL endpoints that are live
updated with the graph split by the end of June. This timeline is probably
slightly optimistic, we’ll let you know when those are ready.
Once the new SPARQL endpoints that are live updated with the graph split
are available, we will provide a 6 months transition period, during which
the current endpoint (query.wikidata.org/sparql) will keep serving the full
graph. Once that transition is over, query.wikidata.org will only serve the
main graph. Queries that need federation will need to be rewritten. You can
ask for help to rewrite queries
<https://www.wikidata.org/wiki/Wikidata:Request_a_query_rewrite>.
Thank you all for your help and support!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, June 5, 2024
Time: 15:00-16:00 UTC / 08:00 PDT / 11:00 EDT / 17:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>