Hello all!
We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
* The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
* The service is restricted behind OAuth authentication, backed by Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
* No documentation on the service is available yet. In particular, no examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
* WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service. * We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.
This is amazing. Thank you. Where can I send congratulatory fruit baskets?
On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell kpeterzell@wikimedia.org wrote:
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.
-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Seconded. I assume the kiwikiwi remains the preferred fruit? [image: kiwi.png]
On Wed, Jul 22, 2020 at 2:21 PM Robert Fernandez wikigamaliel@gmail.com wrote:
This is amazing. Thank you. Where can I send congratulatory fruit baskets?
On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell kpeterzell@wikimedia.org wrote:
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.
-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I'm not sure how convenient it is to ship a crate of kiwikiwi half way across the world, but in any case, all the credit to WCQS goes to Maryum, Zbyszko, David, Erik and Carly! I hope we'll have the opportunity to meet in person again in a somewhat near future. If that's the case, I'm sure they'll be happy to receive a few fruits at next Wikidata Conf!
On Thu, Jul 23, 2020 at 4:46 AM Samuel Klein meta.sj@gmail.com wrote:
Seconded. I assume the kiwikiwi remains the preferred fruit? [image: kiwi.png]
On Wed, Jul 22, 2020 at 2:21 PM Robert Fernandez wikigamaliel@gmail.com wrote:
This is amazing. Thank you. Where can I send congratulatory fruit baskets?
On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell < kpeterzell@wikimedia.org> wrote:
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.
-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Congratulations and innumerable thanks to the team!
On Thu, Jul 23, 2020 at 12:09 AM Guillaume Lederrey glederrey@wikimedia.org wrote:
I'm not sure how convenient it is to ship a crate of kiwikiwi half way across the world, but in any case, all the credit to WCQS goes to Maryum, Zbyszko, David, Erik and Carly! I hope we'll have the opportunity to meet in person again in a somewhat near future. If that's the case, I'm sure they'll be happy to receive a few fruits at next Wikidata Conf!
On Thu, Jul 23, 2020 at 4:46 AM Samuel Klein meta.sj@gmail.com wrote:
Seconded. I assume the kiwikiwi remains the preferred fruit? [image: kiwi.png]
On Wed, Jul 22, 2020 at 2:21 PM Robert Fernandez wikigamaliel@gmail.com wrote:
This is amazing. Thank you. Where can I send congratulatory fruit baskets?
On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell < kpeterzell@wikimedia.org> wrote:
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.
-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Amazing indeed. Many congratulations.
Thanks Tito Dutta
On Wed, 22 Jul 2020 at 23:33, Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
- The service is a beta endpoint that is updated via weekly dumps. Some
caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
- The service is restricted behind OAuth authentication, backed by
Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
- No documentation on the service is available yet. In particular, no
examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
- WCQS is a work in progress and some bugs are to be expected, especially
related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.
- We do plan to move the service to production, but we don’t have a
timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Congrats!
Goran S. Milovanović, PhD Data Scientist, Software Department Wikimedia Deutschland
------------------------------------------------ "It's not the size of the dog in the fight, it's the size of the fight in the dog." - Mark Twain ------------------------------------------------
On Wed, Jul 22, 2020 at 8:24 PM Tito Dutta trulytito@gmail.com wrote:
Amazing indeed. Many congratulations.
Thanks Tito Dutta
On Wed, 22 Jul 2020 at 23:33, Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
- The service is a beta endpoint that is updated via weekly dumps. Some
caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
- The service is restricted behind OAuth authentication, backed by
Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
- No documentation on the service is available yet. In particular, no
examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
- WCQS is a work in progress and some bugs are to be expected, especially
related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.
- We do plan to move the service to production, but we don’t have a
timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Congrats! Thank you!
I have updated the documentation page of the Wikibase RDF mapping to add MediaInfo entities: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#MediaInfo
Thomas
Le mer. 22 juil. 2020 à 21:41, Goran Milovanovic goran.milovanovic_ext@wikimedia.de a écrit :
Congrats!
Goran S. Milovanović, PhD Data Scientist, Software Department Wikimedia Deutschland
"It's not the size of the dog in the fight, it's the size of the fight in the dog."
- Mark Twain
On Wed, Jul 22, 2020 at 8:24 PM Tito Dutta trulytito@gmail.com wrote:
Amazing indeed. Many congratulations.
Thanks Tito Dutta
On Wed, 22 Jul 2020 at 23:33, Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
- The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.
- The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime.
The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
The service is restricted behind OAuth authentication, backed by Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed.
- Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
No documentation on the service is available yet. In particular, no examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.
- Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:
- URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly.
- Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.
- If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.
We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
"Try it" issue worked around.
Include the line |project=sdc when calling the SPARQL template, to have the query go to the WCQS beta service
The template https://commons.wikimedia.org/wiki/Template:SPARQL will need to be updated to point to the final endpoint when this is available.
The "Try it" links at
https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... should now all work.
-- James.
On 22/07/2020 19:02, Guillaume Lederrey wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
- The service is a beta endpoint that is updated via weekly dumps. Some
caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
- The service is restricted behind OAuth authentication, backed by Commons.
You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
- No documentation on the service is available yet. In particular, no
examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
- WCQS is a work in progress and some bugs are to be expected, especially
related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.
- We do plan to move the service to production, but we don’t have a
timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Awesome, i'm really happy we finally have at least a start of a functioning query service.
For now, the two things that i guess would be helpful for most query writers: 1) A way to make ImageGrid work without resorting to the clunky Special:FilePath hack 2) A nicer way to query Wikidata information without using federation.
I guess 2) might be a bit more difficult, but it might definitely be something to consider.
Kind regards, -- Hay
On Wed, Jul 22, 2020 at 8:03 PM Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
- The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.
- The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime.
The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
The service is restricted behind OAuth authentication, backed by Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed.
- Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
No documentation on the service is available yet. In particular, no examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.
- Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:
- URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly.
- Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.
- If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.
We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 23/07/2020 22:26, Hay (Husky) wrote:
Awesome, i'm really happy we finally have at least a start of a functioning query service.
For now, the two things that i guess would be helpful for most query writers:
- A way to make ImageGrid work without resorting to the clunky
Special:FilePath hack 2) A nicer way to query Wikidata information without using federation.
I guess 2) might be a bit more difficult, but it might definitely be something to consider.
Kind regards, -- Hay
IMO, the best way to avoid the hack of having to construct the Special:FilePath urls would be to have these simply available as triples, so the standard way to get an image with this sort of URL would just be
sdc:M1234567 ?relation commons:filename.jpg
for some ?relation to be determined.
Also, as User:Mfchris84 has noted on the talk page, another reason to desire such triples is to make it possible to map from the values of wikidata properties like P18 ("image") and its friends, which are of the form commons:filename.jpg to the corresponding M-IDs, so that structured data about the files can be accessed.
At the moment going from commons:filename.jpg -> M-ID -> sdc data is not straightforward.
Is there a particular reason that schema:contentUrl was chosen to point to URLs of the form https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses?
Is the specific location useful? (Is it even stable?) Or would it make more sense to use the established url form for these links, as used in WDQS ?
As to Hay's (2), being able to use the SERVICE for label lookup in WCQS without requiring explicit federation to WDQS would be a nice step forward.
But overall I am *hugely* impressed by what the team has rolled out. Thank you so much!
-- James.
Is there a particular reason that schema:contentUrl was chosen to point to URLs of the form https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses? Is the specific location useful? (Is it even stable?) Or would it make more sense to use the established url form for these links, as used in WDQS ?
These URLs are the URLs that are used by Wikimedia servers to send back the files. Special:FilePath URLs redirect to them. I guess that they are mostly stable, it is the ones that are used by Wikipedia/Commons since at least 15 years. These URLs were not used in Wikidata RDF dumps because it would have required additional database queries to fetch them while generating the RDF dumps. But the SDOC dumps generator already fetches the files medata to output the file type and dimensions, so it is cheap for it to also output these URLs. But, anyway, it should be fairly easy to change them to point to Special:FilePath instead and, indeed, make interoperability with Wikidata easier. I could easily write a patch for it if the Search team is ok with it. May you open a Phabricator ticket about it?
Another option is to directly add the M-IDs to Wikidata RDF representation. I have just opened a ticket about it: https://phabricator.wikimedia.org/T258776
Cheers,
Thomas
Le jeu. 23 juil. 2020 à 23:50, James Heald jpm.heald@gmail.com a écrit :
On 23/07/2020 22:26, Hay (Husky) wrote:
Awesome, i'm really happy we finally have at least a start of a functioning query service.
For now, the two things that i guess would be helpful for most query writers:
- A way to make ImageGrid work without resorting to the clunky
Special:FilePath hack 2) A nicer way to query Wikidata information without using federation.
I guess 2) might be a bit more difficult, but it might definitely be something to consider.
Kind regards, -- Hay
IMO, the best way to avoid the hack of having to construct the Special:FilePath urls would be to have these simply available as triples, so the standard way to get an image with this sort of URL would just be
sdc:M1234567 ?relation commons:filename.jpg
for some ?relation to be determined.
Also, as User:Mfchris84 has noted on the talk page, another reason to desire such triples is to make it possible to map from the values of wikidata properties like P18 ("image") and its friends, which are of the form commons:filename.jpg to the corresponding M-IDs, so that structured data about the files can be accessed.
At the moment going from commons:filename.jpg -> M-ID -> sdc data is not straightforward.
Is there a particular reason that schema:contentUrl was chosen to point to URLs of the form https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses?
Is the specific location useful? (Is it even stable?) Or would it make more sense to use the established url form for these links, as used in WDQS ?
As to Hay's (2), being able to use the SERVICE for label lookup in WCQS without requiring explicit federation to WDQS would be a nice step forward.
But overall I am *hugely* impressed by what the team has rolled out. Thank you so much!
-- James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Fri, Jul 24, 2020 at 10:52 AM Thomas Pellissier Tanon < thomas@pellissier-tanon.fr> wrote:
Is there a particular reason that schema:contentUrl was chosen to
point to URLs of the form < https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg%3E rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses?
Is the specific location useful? (Is it even stable?) Or would it make
more sense to use the established url form for these links, as used in WDQS ?
But, anyway, it should be fairly easy to change them to point to Special:FilePath instead and, indeed, make interoperability with Wikidata easier. I could easily write a patch for it if the Search team is ok with it. May you open a Phabricator ticket about it?
No objections from me, a ticket would be nice indeed so that we can continue the discussion there, thanks!
-- David C.
On Thu, Jul 23, 2020 at 11:26 PM Hay (Husky) huskyr@gmail.com wrote:
Awesome, i'm really happy we finally have at least a start of a functioning query service.
For now, the two things that i guess would be helpful for most query writers:
- A way to make ImageGrid work without resorting to the clunky
Special:FilePath hack
I've created https://phabricator.wikimedia.org/T258769 to track this request. Feel free to add any details on the task. No promise on when we'll have time to work on it. We now need to focus again on the new streaming updater for WDQS (and eventually for WCQS) and improving the stability / scaling of WDQS.
- A nicer way to query Wikidata information without using federation.
It is unlikely that we'll be able to remove federation in this case. At least it is unlikely that we'll merge those 2 graphs in a single Blazegraph instance. Given the concerns we have about scaling this service, splitting into more federated graphs seems a better option than merging into larger graphs. There might be ways to pull a subset of the Wikidata dataset into WCQS, but that looks like a complex problem.
I guess 2) might be a bit more difficult, but it might definitely be
something to consider.
Kind regards, -- Hay
On Wed, Jul 22, 2020 at 8:03 PM Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query
Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons
(SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:
- The service is a beta endpoint that is updated via weekly dumps. Some
caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.
* The service is hosted on Wikimedia Cloud Services, with limited
resources and limited monitoring. This means there may be random unplanned downtime.
The data will be reloaded weekly from dumps. The service will be down
during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows.
* Due to an issue with the dump format, the data currently only
dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474)
* The MediaInfo concept URIs (e.g.
http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.
- The service is restricted behind OAuth authentication, backed by
Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed.
* Please note that to correctly logout of the service, you need to
use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.
- No documentation on the service is available yet. In particular, no
examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples .
* Please use the SPARQL template. Note that while there is currently
a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.
- WCQS is a work in progress and some bugs are to be expected,
especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:
* URI prefixes specific for SDoC data don’t yet work - you need to
use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly.
* Autocomplete for SDoC items doesn’t work - without prefixes they’d
be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.
* If you find any additional bugs or issues, please report them via
Phabricator with the tag wikidata-query-service.
- We do plan to move the service to production, but we don’t have a
timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.
Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Fri, Jul 24, 2020 at 10:39 AM Guillaume Lederrey glederrey@wikimedia.org wrote:
I've created https://phabricator.wikimedia.org/T258769 to track this request.
Great, thanks for doing that. I've added an example query (https://tinyurl.com/y26mtate) of how things are currently done.
It is unlikely that we'll be able to remove federation in this case. At least it is unlikely that we'll merge those 2 graphs in a single Blazegraph instance. Given the concerns we have about scaling this service, splitting into more federated graphs seems a better option than merging into larger graphs. There might be ways to pull a subset of the Wikidata dataset into WCQS, but that looks like a complex problem.
Yeah, I get that. I think it might be more interesting in the short-term to figure out how things can get easier in terms of adding some extra features in the query UX to make write federation SPARQL easier.
-- Hay
Hi all,
We are looking for someone with data visualisation experience to help us with a new pilot research project investigating gaps in coverage of Wikimedia in Australia (see ad below). We have received a small amount of money by UTS to run this pilot while we wait on the results of a proposed for a much larger project. The pilot will probably only be a few days work but there may be more work down the line. We would love to hear from anyone interested in working with us on this! Please email me if interested.
Best,
Heather.
*Data visualisations expert *
Heather Ford http://hblog.org, Head of Digital and Social Media at UTS https://www.uts.edu.au/future-students/communication/digital-and-social-media in Sydney and Tamson Pietsch, Head of the Centre for Public History at UTS https://www.uts.edu.au/research-and-teaching/our-research/australian-centre-public-historyalong with collaborators, Wikimedia Australia https://wikimedia.org.au/wiki/Wikimedia_Australia (including Pru Mitchell and [[User:99of9|Toby Hudson]]) are involved in a project to analyse Wikipedia’s scope and progress over the past twenty years (relating to Australia and possibly globally). We are looking for someone to help us to develop a series of visualisations for a pilot project. This will involve extracting data about en.wp.org https://protect-au.mimecast.com/s/QNnXCJyBPjFg0KzMhv8Did?domain=protect-au.mimecast.com/ articles (either from Wikipedia or via Wikidata) and comparing it to another dataset (possibly *https://honours.pmc.gov.au/honours/search https://protect-au.mimecast.com/s/PMNTCGv097U2jAlXcQ_ELf?domain=honours.pmc.gov.au)*, cleaning and coding data and, importantly, visualising the data using mapping and other visualisation tools. This is a pilot project with resources for a few days work which we would ideally like to happen over the next month. Experience with Wikimedia data analysis is a plus. Please contact Heather for more information or to express interest.
---------------------------
Dr Heather Ford Associate Professor and Head of Discipline (Digital and Social Media https://www.uts.edu.au/future-students/communication/digital-and-social-media ) School of Communication https://www.uts.edu.au/future-students/communication/about-communication/welcome-school-communication, University of Technology, Sydney https://www.uts.edu.au/ (UTS)
w: hblog.org / t: @hfordsa http://www.twitter.com/hfordsa