Some Scholia query rewriting discussion is here: https://github.com/WDscholia/scholia/issues/2423

Egon

On Tue, 11 Jun 2024 at 18:02, Samuel Klein <meta.sj@gmail.com> wrote:
It would be helpful to see how the standard Scholia queries work under federation.  (those that need it)

Are there evals for other graph dbs on how they handle federation?

On Tue, Jun 11, 2024 at 10:39 AM Egon Willighagen <egon.willighagen@gmail.com> wrote:

Hi, thank you for the update.

The email writes that "Queries that need federation will need to be rewritten. You can ask for help to rewrite queries".

Do you have guidelines on how to do this? It took quite some effort to make some of the (I thought simple) queries work, but later improvements showed more workable. How were they developed? How do people rewrite the SPARQL queries when two or more query triples are distributed over the two SPARQL endpoint, and particularly when they depend on each other?

Egon


On Tue, 11 Jun 2024 at 16:17, Guillaume Lederrey <glederrey@wikimedia.org> wrote:

Hello all!

The feedback period for our WDQS Graph Split proposal has come to an end. Many thanks to all people who sent comments, your contribution is invaluable!

We’ve incorporated most comments and proposals into our final set of rules for the graph split. The main proposals (including some that were rejected) were:

  • Duplicate properties in both graph (wd:P*) does not seem necessary and won't be done
  • The list of types of publications that identify what is a scholarly article have been improved, see the final list of items here
  • It was discussed whether sitelinks should inform the nature of the split or not; this idea was not incorporated because it might make it harder to understand what is where
  • Discussions and investigations regarding items that define multiple instance of (P31) which might be ambiguous, it appears that it might not affect a lot of items and that the solution might be to disambiguate these instances by creating separate entities (see the Clinical Trials section of the Talk Page).
  • Re-thinking how scholarly articles are modelled was raised, especially by identifying the nature of the publication using a separate property rather than using instance of (P31). This idea should probably be explored and discussed by the wikicite community, since it does affect the nature of the split but could be a nice criteria to take into consideration in the future.

We are now working on implementing the appropriate tooling to manage this split, including a new way of processing the Wikidata dumps for an initial load, modification to the update pipeline to support the graph split, and additional automation. We hope to have new SPARQL endpoints that are live updated with the graph split by the end of June. This timeline is probably slightly optimistic, we’ll let you know when those are ready.

Once the new SPARQL endpoints that are live updated with the graph split are available, we will provide a 6 months transition period, during which the current endpoint (query.wikidata.org/sparql) will keep serving the full graph. Once that transition is over, query.wikidata.org will only serve the main graph. Queries that need federation will need to be rewritten. You can ask for help to rewrite queries.

Thank you all for your help and support!


Guillaume


--
Guillaume Lederrey (he/him)
Engineering Manager
Wikimedia Foundation
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/YS26TSGY3YRSJADWAE3DXSVQR43FNK4K/
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org


--
Some nanomaterials stress our cells and cause key event, some towards adverse outcomes. Read about it in our new paper "From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials", https://doi.org/10.1186/s13321-024-00833 

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Blog: https://chem-bla-ics.linkedchemistry.info/
Mastodon: https://social.edu.nl/@egonw
PubList: https://orcid.org/0000-0001-7542-0286
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/UMSJIAK5BFLGIBRJP6IVY572G4D64QCK/
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org


--
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/PONNUDTGWRDMOFKOTGIKIVHLYLIAY7G2/
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org


--
Some nanomaterials stress our cells and cause key event, some towards adverse outcomes. Read about it in our new paper "From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials", https://doi.org/10.1186/s13321-024-00833 

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Blog: https://chem-bla-ics.linkedchemistry.info/
Mastodon: https://social.edu.nl/@egonw
PubList: https://orcid.org/0000-0001-7542-0286