Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257 shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)
Thanks in advance!
MIKE
Hi Mike,
Looks like this is a problem with the “graph split” where Wikidata now has two triple stores. Theoretically all the scholarly articles moved to the second graph, but what means “scholarly”? I think BHL content is now split across the two graphs.
The second graph can be queried at https://query-scholarly.wikidata.org, your second query works fine there: https://w.wiki/EUAc
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257%C2%A0shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)
Thanks in advance!
MIKE
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
On Fri, 13 Jun 2025 at 21:12, Roderic D. M. Page rdmpage@gmail.com wrote:
I think BHL content is now split across the two graphs.
This would be worthy of further investigation: are some BHL items miscategoriesed? Or is the graph split missing one or more categories that it should include?
Or is this by design?
Andy,
Running the query on th two graphs sugegsts that the BHL items on the scholarly graphy re instances of “scholary articles” whereas those on the main graph are other publication types (books or equivalent terms).
Regards, Rod On 13 Jun 2025 at 21:58 +0100, Andy Mabbett andy@pigsonthewing.org.uk, wrote:
On Fri, 13 Jun 2025 at 21:12, Roderic D. M. Page rdmpage@gmail.com wrote:
I think BHL content is now split across the two graphs.
This would be worthy of further investigation: are some BHL items miscategoriesed? Or is the graph split missing one or more categories that it should include?
Or is this by design?
-- Andy Mabbett @pigsonthewing https://pigsonthewing.org.uk _______________________________________________ bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
On Sat, 14 Jun 2025 at 00:15, Roderic D. M. Page rdmpage@gmail.com wrote:
Running the query on th two graphs sugegsts that the BHL items on the scholarly graphy re instances of “scholary articles” whereas those on the main graph are other publication types (books or equivalent terms).
Thank you. I am reminded that this is an unresolved issue which I raised in September 2024:
https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_query_service/WDQS_graph_...
Of course, that’s not much help. What you need is to do a federated query that spans both graphs. These have caused some heartache. I’ll try and do one for the original query.
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257%C2%A0shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)
Thanks in advance!
MIKE
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
Oh, yes, a complete pain.
We are trying to do a parallel documentation for some of the things here: https://meta.wikimedia.org/wiki/WikiCite/WDQS_graph_split (this is kind of my job until August!)
Quickest way: use https://query-legacy-full.wikidata.org/ https://query-legacy-full.wikidata.org/ until November, here is it https://w.wiki/EUAp
Now is an attempt at the federated version (it seems to be relatively simple for this case). It gives *67858* results, same as legacy, so it may be fine:
Best, Tiago
*Tiago Lubiana*
*tiago.bio.br https://tiago.bio.br*
On Fri, Jun 13, 2025 at 5:14 PM Roderic D. M. Page rdmpage@gmail.com wrote:
Of course, that’s not much help. What you need is to do a federated query that spans both graphs. These have caused some heartache. I’ll try and do one for the original query.
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257 shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
*So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)*
Thanks in advance!
MIKE
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
Tiago and Rod,
Thanks for the responses. Looks like the two of you have provided me two federated options (one to submit from the new scholarly endpoint and one for the OG endpoint), as well as an endpoint (query-legacy-full) where I can submit my original query.
I suppose another option is to submit the original query twice... once to the scholarly endpoint and once to the original endpoint... and perform the union of the results on the client side.
Questions:
* Tiago, you suggested using query-legacy-full until November. What happens after November? Will the federated queries change after November? * Is there a way to avoid the federated query by submitting the original query via an API? (I didn't see a way, but want to make sure I'm not overlooking something.)
I am asking those questions because what I am doing is putting together a way to automatically
1. extract Title and Author identifiers from Wikidata and add them to BHL, and then 2. identify discrepancies within the data and make those available to the communities for investigation and correction.
After learning about the split graph and the need for federated queries, my concern is that if I build something now it will need to be revisited later in the year as things evolve.
Thanks,
MIKE
________________________________ From: Tiago Lubiana tiagolubiana@gmail.com Sent: Friday, June 13, 2025 3:24 PM To: rdmpage rdmpage@gmail.com Cc: bhl-wiki@lists.wikimedia.org bhl-wiki@lists.wikimedia.org; Lichtenberg, Mike LichtenbergM@si.edu Subject: Re: [bhl-wiki] Re: Help with wikidata query
External Email - Exercise Caution
Oh, yes, a complete pain.
We are trying to do a parallel documentation for some of the things here: https://meta.wikimedia.org/wiki/WikiCite/WDQS_graph_split (this is kind of my job until August!)
Quickest way: use https://query-legacy-full.wikidata.org/ until November, here is it https://w.wiki/EUAp
Now is an attempt at the federated version (it seems to be relatively simple for this case). It gives 67858 results, same as legacy, so it may be fine:
Best, Tiago
Tiago Lubiana tiago.bio.brhttps://tiago.bio.br/
On Fri, Jun 13, 2025 at 5:14 PM Roderic D. M. Page <rdmpage@gmail.commailto:rdmpage@gmail.com> wrote: Of course, that’s not much help. What you need is to do a federated query that spans both graphs. These have caused some heartache. I’ll try and do one for the original query.
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike <LichtenbergM@si.edumailto:LichtenbergM@si.edu>, wrote: Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257 shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)
Thanks in advance!
MIKE
_______________________________________________ bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.orgmailto:bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/ _______________________________________________ bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.orgmailto:bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
Mike,
My understanding is that a query written today that federates over query.wikidata.org and query-scholarly.wikidata.org will work for the foreseeable future. Once query-legacy-full stops working federation will be the only way to reach the whole graph.
So my sense is if you write a federated query now and run it on either query or query-scholarly (and federate with query-scholarly and query, respectively) you will be fine.
Regards, Rod On 13 Jun 2025 at 21:56 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Tiago and Rod,
Thanks for the responses. Looks like the two of you have provided me two federated options (one to submit from the new scholarly endpoint and one for the OG endpoint), as well as an endpoint (query-legacy-full) where I can submit my original query.
I suppose another option is to submit the original query twice... once to the scholarly endpoint and once to the original endpoint... and perform the union of the results on the client side.
Questions:
• > Tiago, you suggested using query-legacy-full until November. What happens after November? Will the federated queries change after November? • > Is there a way to avoid the federated query by submitting the original query via an API? (I didn't see a way, but want to make sure I'm not overlooking something.)
I am asking those questions because what I am doing is putting together a way to automatically
extract Title and Author identifiers from Wikidata and add them to BHL, and then identify discrepancies within the data and make those available to the communities for investigation and correction.After learning about the split graph and the need for federated queries, my concern is that if I build something now it will need to be revisited later in the year as things evolve.
Thanks,
MIKE
From: Tiago Lubiana tiagolubiana@gmail.com Sent: Friday, June 13, 2025 3:24 PM To: rdmpage rdmpage@gmail.com Cc: bhl-wiki@lists.wikimedia.org bhl-wiki@lists.wikimedia.org; Lichtenberg, Mike LichtenbergM@si.edu Subject: Re: [bhl-wiki] Re: Help with wikidata query
External Email - Exercise Caution Oh, yes, a complete pain.
We are trying to do a parallel documentation for some of the things here: https://meta.wikimedia.org/wiki/WikiCite/WDQS_graph_split%C2%A0 (this is kind of my job until August!)
Quickest way: use https://query-legacy-full.wikidata.org/ until November, here is it https://w.wiki/EUAp
Now is an attempt at the federated version (it seems to be relatively simple for this case). It gives 67858 results, same as legacy, so it may be fine:
query link here
Best, Tiago
Tiago Lubiana tiago.bio.br
On Fri, Jun 13, 2025 at 5:14 PM Roderic D. M. Page rdmpage@gmail.com wrote:
Of course, that’s not much help. What you need is to do a federated query that spans both graphs. These have caused some heartache. I’ll try and do one for the original query.
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257%C2%A0shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)
Thanks in advance!
MIKE
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
Hi, Rod, Andy, and Mike,
A set of "P31" values characterizes something as *scholarly *(see here <www.wikidata.org/w/index.php?title=Wikidata:SPARQL_query_service/WDQS_graph_split/Rules>). Arguably, some items in the *main* graph should be in the *scholarly*, but the team opted for a lean set of values, trying to optimize for simplicity.
There is also a property, P13046 ("publication type of scholarly work"), that warps anything to the main graph. So if, say, there is a scholarly book in *main* that may belong in the *scholarly *graph, you could use P13046.
The team expects that eventually, the presence of P13046 will be used as the sole marker of "scholarliness". The exact rules for the split may change *slightly* in the future.
But I agree with Rod, either federated versions are very likely to work for the next few years. Eventually, the Query Service will move off the Blazegraph backend https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update, which may cause new issues, but I would guess this is more in the 5-10 year range.
With regards to an alternative API, you can use:
* https://qlever.cs.uni-freiburg.de, unofficial, very fast, updated every couple of days, to run some SPARQL queries. They may need some changes, as both QLever and WDQS have different non-standard details.
* There is a Wikidata REST API https://www.wikidata.org/wiki/Wikidata:REST_API, but I think it does not cover your use case.
If you don't need live updates, I would say running on QLever is a good choice.
Best,
*Tiago Lubiana*
*tiago.bio.br https://tiago.bio.br*
On Fri, Jun 13, 2025 at 8:20 PM Roderic D. M. Page rdmpage@gmail.com wrote:
Mike,
My understanding is that a query written today that federates over query.wikidata.org and query-scholarly.wikidata.org will work for the foreseeable future. Once query-legacy-full stops working federation will be the only way to reach the whole graph.
So my sense is if you write a federated query now and run it on either query or query-scholarly (and federate with query-scholarly and query, respectively) you will be fine.
Regards, Rod On 13 Jun 2025 at 21:56 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Tiago and Rod,
Thanks for the responses. Looks like the two of you have provided me two federated options (one to submit from the new scholarly endpoint and one for the OG endpoint), as well as an endpoint (query-legacy-full) where I can submit my original query.
I suppose another option is to submit the original query twice... once to the scholarly endpoint and once to the original endpoint... and perform the union of the results on the client side.
Questions:
- Tiago, you suggested using query-legacy-full *until November*. What
happens after November? Will the federated queries change after November?
- Is there a way to avoid the federated query by submitting the
original query via an API? (I didn't see a way, but want to make sure I'm not overlooking something.)
I am asking those questions because what I am doing is putting together a way to automatically
- extract Title and Author identifiers from Wikidata and add them to
BHL, and then 2. identify discrepancies within the data and make those available to the communities for investigation and correction.
After learning about the split graph and the need for federated queries, my concern is that if I build something now it will need to be revisited later in the year as things evolve.
Thanks,
MIKE
*From:* Tiago Lubiana tiagolubiana@gmail.com *Sent:* Friday, June 13, 2025 3:24 PM *To:* rdmpage rdmpage@gmail.com *Cc:* bhl-wiki@lists.wikimedia.org bhl-wiki@lists.wikimedia.org; Lichtenberg, Mike LichtenbergM@si.edu *Subject:* Re: [bhl-wiki] Re: Help with wikidata query
*External Email - Exercise Caution* Oh, yes, a complete pain.
We are trying to do a parallel documentation for some of the things here: https://meta.wikimedia.org/wiki/WikiCite/WDQS_graph_split (this is kind of my job until August!)
Quickest way: use https://query-legacy-full.wikidata.org/ until November, here is it https://w.wiki/EUAp
Now is an attempt at the federated version (it seems to be relatively simple for this case). It gives *67858* results, same as legacy, so it may be fine:
Best, Tiago
*Tiago Lubiana* *tiago.bio.br https://tiago.bio.br/*
On Fri, Jun 13, 2025 at 5:14 PM Roderic D. M. Page rdmpage@gmail.com wrote:
Of course, that’s not much help. What you need is to do a federated query that spans both graphs. These have caused some heartache. I’ll try and do one for the original query.
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257 shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
*So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)*
Thanks in advance!
MIKE
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/
OK, try this:
SELECT DISTINCT * WHERE { { SERVICE wdsubgraph:wikidata_main { ?item wdt:P4327 ?TitleID. OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } } } UNION { ?item wdt:P4327 ?TitleID. OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } } }
You’ll need to run this on the scholarly graph https://query-scholarly.wikidata.org, the link to the query is: https://w.wiki/EUAs
Note the SERVICE wdsubgraph:wikidata_main bit which means that par tof the query is running on the main graph, the second part runs on scholarly, and the results get merged… fingers crossed.
Regards, Rod On 13 Jun 2025 at 21:13 +0100, Roderic D. M. Page rdmpage@gmail.com, wrote:
Of course, that’s not much help. What you need is to do a federated query that spans both graphs. These have caused some heartache. I’ll try and do one for the original query.
Regards, Rod On 13 Jun 2025 at 19:04 +0100, Lichtenberg, Mike LichtenbergM@si.edu, wrote:
Please help out a sparql newbie!
I have the following query to select identifiers for BHL Titles:
SELECT DISTINCT ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } { SELECT DISTINCT ?item WHERE { ?item p:P4327 ?statement0. ?statement0 ps:P4327 _:anyValueP4327. } } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
It seems to work, as it returns about 170K unique identifiers for about 64K titles.
However, after adding those identifiers to a BHL database table where I could compare the data to what is already in BHL, I found that some items are missing from the sparql result.
For example, Q51512257 is associated with BHL Title 88, Q54792313 with BHL TItle 100, and Q51399752 with BHL Title 1231. The query above does not return any of those wikidata items. I modified the query to return just one of those wikidata items...
SELECT DISTINCT ?item ?TitleID ?Wikidata ?OCLC ?ISSN ?Linking_ISSN ?ISBN13 ?ISBN10 ?Coden ?DLC ?NLM ?TL2 ?DOI WHERE { BIND(wd:Q51512257 AS ?item) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". } BIND(REPLACE(STR(?item), "http://www.wikidata.org/entity/", "") AS ?Wikidata ) OPTIONAL { ?item wdt:P4327 ?TitleID. } OPTIONAL { ?item wdt:P243 ?OCLC. } OPTIONAL { ?item wdt:P236 ?ISSN. } OPTIONAL { ?item wdt:P7363 ?Linking_ISSN. } OPTIONAL { ?item wdt:P212 ?ISBN13. } OPTIONAL { ?item wdt:P957 ?ISBN10. } OPTIONAL { ?item wdt:P1159 ?Coden. } OPTIONAL { ?item wdt:P244 ?DLC. } OPTIONAL { ?item wdt:P1055 ?NLM. } OPTIONAL { ?item wdt:P5878 ?TL2. } OPTIONAL { ?item wdt:P356 ?DOI. } }
... and it did return the item, but no identifier values. Viewing https://www.wikidata.org/wiki/Q51512257%C2%A0shows that the item does in fact have a DOI, BHL Bibliography ID (Title ID), and OCLC Control Number.
So, what is going wrong? Is there something wrong with my query, or is there something unusual about those items? (Or, I guess... both... is there something unusual that my query is not accounting for?)
Thanks in advance!
MIKE
bhl-wiki mailing list -- bhl-wiki@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/bhl-wiki.lists.wikimedia.org/