Wikimedia Commons Query Service (WCQS)

List overview All Threads
Download

newer

older

2 million queries against a...

Call for anyone interested in...

Guillaume Lederrey

22 Jul 2020 22 Jul '20

11:02 a.m.

Hello all!

We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

* The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

* The service is restricted behind OAuth authentication, backed by Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

* No documentation on the service is available yet. In particular, no examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

* WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service. * We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

Have fun!

Guillaume

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET

Attachments:

attachment.htm (text/html — 4.2 KB)

Show replies by date

Keegan Peterzell

22 Jul 22 Jul

11:13 a.m.

As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.

-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation

Robert Fernandez

11:20 a.m.

This is amazing. Thank you. Where can I send congratulatory fruit baskets?

On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell kpeterzell@wikimedia.org wrote:

...

As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.

-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Samuel Klein

7:45 p.m.

Seconded. I assume the kiwikiwi remains the preferred fruit? [image: kiwi.png]

On Wed, Jul 22, 2020 at 2:21 PM Robert Fernandez wikigamaliel@gmail.com wrote:

...

This is amazing. Thank you. Where can I send congratulatory fruit baskets?

On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell kpeterzell@wikimedia.org wrote:

...
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.

-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Samuel Klein @metasj w:user:sj +1 617 529 4266

Guillaume Lederrey

23 Jul 23 Jul

12:08 a.m.

I'm not sure how convenient it is to ship a crate of kiwikiwi half way across the world, but in any case, all the credit to WCQS goes to Maryum, Zbyszko, David, Erik and Carly! I hope we'll have the opportunity to meet in person again in a somewhat near future. If that's the case, I'm sure they'll be happy to receive a few fruits at next Wikidata Conf!

On Thu, Jul 23, 2020 at 4:46 AM Samuel Klein meta.sj@gmail.com wrote:

...

Seconded. I assume the kiwikiwi remains the preferred fruit? [image: kiwi.png]

On Wed, Jul 22, 2020 at 2:21 PM Robert Fernandez wikigamaliel@gmail.com wrote:

...
This is amazing. Thank you. Where can I send congratulatory fruit baskets?

On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell < kpeterzell@wikimedia.org> wrote:

...
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.

-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET

Ramsey Isler

10:31 a.m.

Congratulations and innumerable thanks to the team!

On Thu, Jul 23, 2020 at 12:09 AM Guillaume Lederrey glederrey@wikimedia.org wrote:

...

I'm not sure how convenient it is to ship a crate of kiwikiwi half way across the world, but in any case, all the credit to WCQS goes to Maryum, Zbyszko, David, Erik and Carly! I hope we'll have the opportunity to meet in person again in a somewhat near future. If that's the case, I'm sure they'll be happy to receive a few fruits at next Wikidata Conf!

On Thu, Jul 23, 2020 at 4:46 AM Samuel Klein meta.sj@gmail.com wrote:

...
Seconded. I assume the kiwikiwi remains the preferred fruit? [image: kiwi.png]

On Wed, Jul 22, 2020 at 2:21 PM Robert Fernandez wikigamaliel@gmail.com wrote:

...
This is amazing. Thank you. Where can I send congratulatory fruit baskets?

On Wed, Jul 22, 2020 at 2:14 PM Keegan Peterzell < kpeterzell@wikimedia.org> wrote:

...
As a followup, I'll be posting about this beta launch on various places across the wikis over the next few days. Please keep an eye out for discussions around the tool and opportunities to write up documentation on Commons.

-- Keegan Peterzell (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Tito Dutta

22 Jul 22 Jul

11:22 a.m.

Amazing indeed. Many congratulations.

Thanks Tito Dutta

On Wed, 22 Jul 2020 at 23:33, Guillaume Lederrey glederrey@wikimedia.org wrote:

...

Hello all!

We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

The service is a beta endpoint that is updated via weekly dumps. Some

caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

The service is restricted behind OAuth authentication, backed by

Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

No documentation on the service is available yet. In particular, no

examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

WCQS is a work in progress and some bugs are to be expected, especially

related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.

We do plan to move the service to production, but we don’t have a

timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

Have fun!

Guillaume

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Goran Milovanovic

12:39 p.m.

Congrats!

Goran S. Milovanović, PhD Data Scientist, Software Department Wikimedia Deutschland

------------------------------------------------ "It's not the size of the dog in the fight, it's the size of the fight in the dog." - Mark Twain ------------------------------------------------

On Wed, Jul 22, 2020 at 8:24 PM Tito Dutta trulytito@gmail.com wrote:

...

Amazing indeed. Many congratulations.

Thanks Tito Dutta

On Wed, 22 Jul 2020 at 23:33, Guillaume Lederrey glederrey@wikimedia.org wrote:

...
Hello all!

We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

The service is a beta endpoint that is updated via weekly dumps. Some

caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

The service is restricted behind OAuth authentication, backed by

Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

No documentation on the service is available yet. In particular, no

examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

WCQS is a work in progress and some bugs are to be expected, especially

related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.

We do plan to move the service to production, but we don’t have a

timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

Have fun!

Guillaume

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Thomas Pellissier Tanon

1:25 p.m.

Congrats! Thank you!

I have updated the documentation page of the Wikibase RDF mapping to add MediaInfo entities: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#MediaInfo

Thomas

Le mer. 22 juil. 2020 à 21:41, Goran Milovanovic goran.milovanovic_ext@wikimedia.de a écrit :

...

Congrats!

Goran S. Milovanović, PhD Data Scientist, Software Department Wikimedia Deutschland

"It's not the size of the dog in the fight, it's the size of the fight in the dog."

Mark Twain

On Wed, Jul 22, 2020 at 8:24 PM Tito Dutta trulytito@gmail.com wrote:

...
Amazing indeed. Many congratulations.

Thanks Tito Dutta

On Wed, 22 Jul 2020 at 23:33, Guillaume Lederrey glederrey@wikimedia.org wrote:

...
Hello all!

We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.

The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime.

The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

The service is restricted behind OAuth authentication, backed by Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed.

Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

No documentation on the service is available yet. In particular, no examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.

Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:

URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly.

Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.

If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.

We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

Have fun!

Guillaume

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

James Heald

2:05 p.m.

"Try it" issue worked around.

Include the line |project=sdc when calling the SPARQL template, to have the query go to the WCQS beta service

The template https://commons.wikimedia.org/wiki/Template:SPARQL will need to be updated to point to the final endpoint when this is available.

The "Try it" links at

https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... should now all work.

-- James.

On 22/07/2020 19:02, Guillaume Lederrey wrote:

...

Hello all!

We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

The service is a beta endpoint that is updated via weekly dumps. Some

caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees. * The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime. The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

The service is restricted behind OAuth authentication, backed by Commons.

You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed. * Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

No documentation on the service is available yet. In particular, no

examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples . * Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

WCQS is a work in progress and some bugs are to be expected, especially

related to generalizing WDQS to fit SDoC data. For example, current bugs include: * URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly. * Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI. * If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.

We do plan to move the service to production, but we don’t have a

timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

Have fun!
Guillaume
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Hay (Husky)

23 Jul 23 Jul

2:26 p.m.

Awesome, i'm really happy we finally have at least a start of a functioning query service.

For now, the two things that i guess would be helpful for most query writers: 1) A way to make ImageGrid work without resorting to the clunky Special:FilePath hack 2) A nicer way to query Wikidata information without using federation.

I guess 2) might be a bit more difficult, but it might definitely be something to consider.

Kind regards, -- Hay

On Wed, Jul 22, 2020 at 8:03 PM Guillaume Lederrey glederrey@wikimedia.org wrote:

...

Hello all!

We are happy to announce the availability of Wikimedia Commons Query Service (WCQS): https://wcqs-beta.wmflabs.org/.

This is a beta SPARQL endpoint exposing the Structured Data on Commons (SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

The service is a beta endpoint that is updated via weekly dumps. Some caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.

The service is hosted on Wikimedia Cloud Services, with limited resources and limited monitoring. This means there may be random unplanned downtime.

The data will be reloaded weekly from dumps. The service will be down during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows. * Due to an issue with the dump format, the data currently only dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474) * The MediaInfo concept URIs (e.g. http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

The service is restricted behind OAuth authentication, backed by Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed.

Please note that to correctly logout of the service, you need to use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

No documentation on the service is available yet. In particular, no examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.

Please use the SPARQL template. Note that while there is currently a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

WCQS is a work in progress and some bugs are to be expected, especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:

URI prefixes specific for SDoC data don’t yet work - you need to use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly.

Autocomplete for SDoC items doesn’t work - without prefixes they’d be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.

If you find any additional bugs or issues, please report them via Phabricator with the tag wikidata-query-service.

We do plan to move the service to production, but we don’t have a timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

Have fun!

Guillaume

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

James Heald

2:48 p.m.

On 23/07/2020 22:26, Hay (Husky) wrote:

...

Awesome, i'm really happy we finally have at least a start of a functioning query service.

For now, the two things that i guess would be helpful for most query writers:

A way to make ImageGrid work without resorting to the clunky

Special:FilePath hack 2) A nicer way to query Wikidata information without using federation.

I guess 2) might be a bit more difficult, but it might definitely be something to consider.

Kind regards, -- Hay

IMO, the best way to avoid the hack of having to construct the Special:FilePath urls would be to have these simply available as triples, so the standard way to get an image with this sort of URL would just be

sdc:M1234567 ?relation commons:filename.jpg

for some ?relation to be determined.

Also, as User:Mfchris84 has noted on the talk page, another reason to desire such triples is to make it possible to map from the values of wikidata properties like P18 ("image") and its friends, which are of the form commons:filename.jpg to the corresponding M-IDs, so that structured data about the files can be accessed.

At the moment going from commons:filename.jpg -> M-ID -> sdc data is not straightforward.

Is there a particular reason that schema:contentUrl was chosen to point to URLs of the form https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses?

Is the specific location useful? (Is it even stable?) Or would it make more sense to use the established url form for these links, as used in WDQS ?

As to Hay's (2), being able to use the SERVICE for label lookup in WCQS without requiring explicit federation to WDQS would be a nice step forward.

But overall I am *hugely* impressed by what the team has rolled out. Thank you so much!

-- James.

Thomas Pellissier Tanon

24 Jul 24 Jul

1:26 a.m.

...

Is there a particular reason that schema:contentUrl was chosen to point to URLs of the form https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses? Is the specific location useful? (Is it even stable?) Or would it make more sense to use the established url form for these links, as used in WDQS ?

These URLs are the URLs that are used by Wikimedia servers to send back the files. Special:FilePath URLs redirect to them. I guess that they are mostly stable, it is the ones that are used by Wikipedia/Commons since at least 15 years. These URLs were not used in Wikidata RDF dumps because it would have required additional database queries to fetch them while generating the RDF dumps. But the SDOC dumps generator already fetches the files medata to output the file type and dimensions, so it is cheap for it to also output these URLs. But, anyway, it should be fairly easy to change them to point to Special:FilePath instead and, indeed, make interoperability with Wikidata easier. I could easily write a patch for it if the Search team is ok with it. May you open a Phabricator ticket about it?

Another option is to directly add the M-IDs to Wikidata RDF representation. I have just opened a ticket about it: https://phabricator.wikimedia.org/T258776

Cheers,

Thomas

Le jeu. 23 juil. 2020 à 23:50, James Heald jpm.heald@gmail.com a écrit :

...

On 23/07/2020 22:26, Hay (Husky) wrote:

...
Awesome, i'm really happy we finally have at least a start of a functioning query service.

For now, the two things that i guess would be helpful for most query writers:

A way to make ImageGrid work without resorting to the clunky

Special:FilePath hack 2) A nicer way to query Wikidata information without using federation.

I guess 2) might be a bit more difficult, but it might definitely be something to consider.

Kind regards, -- Hay

IMO, the best way to avoid the hack of having to construct the Special:FilePath urls would be to have these simply available as triples, so the standard way to get an image with this sort of URL would just be
sdc:M1234567 ?relation commons:filename.jpg
for some ?relation to be determined.

Also, as User:Mfchris84 has noted on the talk page, another reason to desire such triples is to make it possible to map from the values of wikidata properties like P18 ("image") and its friends, which are of the form commons:filename.jpg to the corresponding M-IDs, so that structured data about the files can be accessed.

At the moment going from commons:filename.jpg -> M-ID -> sdc data is not straightforward.

Is there a particular reason that schema:contentUrl was chosen to point to URLs of the form https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses?

Is the specific location useful? (Is it even stable?) Or would it make more sense to use the established url form for these links, as used in WDQS ?

As to Hay's (2), being able to use the SERVICE for label lookup in WCQS without requiring explicit federation to WDQS would be a nice step forward.

But overall I am *hugely* impressed by what the team has rolled out. Thank you so much!
-- James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

David Causse

7:02 a.m.

On Fri, Jul 24, 2020 at 10:52 AM Thomas Pellissier Tanon < thomas@pellissier-tanon.fr> wrote:

...

...
Is there a particular reason that schema:contentUrl was chosen to

point to URLs of the form < https://upload.wikimedia.org/wikipedia/commons/q/q1/filename.jpg%3E rather than http://commons.wikimedia.org/wiki/Special:FilePath/filename.jpg that WDQS uses?

...
Is the specific location useful? (Is it even stable?) Or would it make

more sense to use the established url form for these links, as used in WDQS ?

But, anyway, it should be fairly easy to change them to point to Special:FilePath instead and, indeed, make interoperability with Wikidata easier. I could easily write a patch for it if the Search team is ok with it. May you open a Phabricator ticket about it?

No objections from me, a ticket would be nice indeed so that we can continue the discussion there, thanks!

-- David C.

Guillaume Lederrey

1:13 a.m.

On Thu, Jul 23, 2020 at 11:26 PM Hay (Husky) huskyr@gmail.com wrote:

...

Awesome, i'm really happy we finally have at least a start of a functioning query service.

For now, the two things that i guess would be helpful for most query writers:

A way to make ImageGrid work without resorting to the clunky

Special:FilePath hack

I've created https://phabricator.wikimedia.org/T258769 to track this request. Feel free to add any details on the task. No promise on when we'll have time to work on it. We now need to focus again on the new streaming updater for WDQS (and eventually for WCQS) and improving the stability / scaling of WDQS.

...

A nicer way to query Wikidata information without using federation.

It is unlikely that we'll be able to remove federation in this case. At least it is unlikely that we'll merge those 2 graphs in a single Blazegraph instance. Given the concerns we have about scaling this service, splitting into more federated graphs seems a better option than merging into larger graphs. There might be ways to pull a subset of the Wikidata dataset into WCQS, but that looks like a complex problem.

I guess 2) might be a bit more difficult, but it might definitely be

...

something to consider.

Kind regards, -- Hay

On Wed, Jul 22, 2020 at 8:03 PM Guillaume Lederrey glederrey@wikimedia.org wrote:

...
Hello all!

We are happy to announce the availability of Wikimedia Commons Query

Service (WCQS): https://wcqs-beta.wmflabs.org/.

...
This is a beta SPARQL endpoint exposing the Structured Data on Commons

(SDoC) dataset. This endpoint can federate with WDQS. More work is needed as we iterate on the service, but feel free to begin using the endpoint. Known limitations are listed below:

...

The service is a beta endpoint that is updated via weekly dumps. Some

caveats include limited performance, expected downtimes, and no interface, naming, or backward compatibility stability guarantees.

...
* The service is hosted on Wikimedia Cloud Services, with limited
resources and limited monitoring. This means there may be random unplanned downtime.

...
The data will be reloaded weekly from dumps. The service will be down

during data reload. With the current amount of SDoC data, downtime will last approximately 4 hours, but this may increase as SDoC data grows.

...
* Due to an issue with the dump format, the data currently only
dates back to July 5th. We’re working on getting more up-to-date data and hope to have a solution soon. (https://phabricator.wikimedia.org/T258507 and https://phabricator.wikimedia.org/T258474)

...
* The MediaInfo concept URIs (e.g.
http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may change these to HTTPS in the near future. Please comment on T258590 if you have concerns about this change.

...

The service is restricted behind OAuth authentication, backed by

Commons. You will need an account on Commons to access the service. This is so that we can contact abusive bots and/or users and block them selectively as a last resort if needed.

...
* Please note that to correctly logout of the service, you need to
use the logout link in WCQS - logging out of just Wikimedia Commons will not work for WCQS. This limitation will be lifted once we move to production.

...

No documentation on the service is available yet. In particular, no

examples are provided yet. You can add your own examples at https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exam... following the format at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples .

...
* Please use the SPARQL template. Note that while there is currently
a bug that doesn’t allow us to change the “Try it!” link endpoint, the examples will be displayed correctly on the WCQS GUI.

...

WCQS is a work in progress and some bugs are to be expected,

especially related to generalizing WDQS to fit SDoC data. For example, current bugs include:

...
* URI prefixes specific for SDoC data don’t yet work - you need to
use full URIs if you want to query using them. Relations and Q items are defined by Wikidata’s URI prefixes, so they work correctly.

...
* Autocomplete for SDoC items doesn’t work - without prefixes they’d
be unusable anyway, but additional work will be required after we inject SDoC URI prefixes into WCQS GUI.

...
* If you find any additional bugs or issues, please report them via
Phabricator with the tag wikidata-query-service.

...

We do plan to move the service to production, but we don’t have a

timeline on that yet. We want to emphasize that while we do expect a SPARQL endpoint to be part of a medium to long-term solution, it will only be part of that solution. Even once the service is production-ready, it will still have limitations in terms of timeouts, expensive queries, and federation. Some use cases will need to be migrated, over time, to better solutions - once those solutions exist.

...
Have fun!

Guillaume

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET

Hay (Husky)

2:26 a.m.

On Fri, Jul 24, 2020 at 10:39 AM Guillaume Lederrey glederrey@wikimedia.org wrote:

...

I've created https://phabricator.wikimedia.org/T258769 to track this request.

Great, thanks for doing that. I've added an example query (https://tinyurl.com/y26mtate) of how things are currently done.

...

It is unlikely that we'll be able to remove federation in this case. At least it is unlikely that we'll merge those 2 graphs in a single Blazegraph instance. Given the concerns we have about scaling this service, splitting into more federated graphs seems a better option than merging into larger graphs. There might be ways to pull a subset of the Wikidata dataset into WCQS, but that looks like a complex problem.

Yeah, I get that. I think it might be more interesting in the short-term to figure out how things can get easier in terms of adding some extra features in the query UX to make write federation SPARQL easier.

-- Hay

Heather Ford

28 Jul 28 Jul

11:03 p.m.

New subject: Pilot project on Wikimedia gaps: looking for a research assistant

Hi all,

We are looking for someone with data visualisation experience to help us with a new pilot research project investigating gaps in coverage of Wikimedia in Australia (see ad below). We have received a small amount of money by UTS to run this pilot while we wait on the results of a proposed for a much larger project. The pilot will probably only be a few days work but there may be more work down the line. We would love to hear from anyone interested in working with us on this! Please email me if interested.

Best,

Heather.

*Data visualisations expert *

Heather Ford http://hblog.org, Head of Digital and Social Media at UTS https://www.uts.edu.au/future-students/communication/digital-and-social-media in Sydney and Tamson Pietsch, Head of the Centre for Public History at UTS https://www.uts.edu.au/research-and-teaching/our-research/australian-centre-public-historyalong with collaborators, Wikimedia Australia https://wikimedia.org.au/wiki/Wikimedia_Australia (including Pru Mitchell and [[User:99of9|Toby Hudson]]) are involved in a project to analyse Wikipedia’s scope and progress over the past twenty years (relating to Australia and possibly globally). We are looking for someone to help us to develop a series of visualisations for a pilot project. This will involve extracting data about en.wp.org https://protect-au.mimecast.com/s/QNnXCJyBPjFg0KzMhv8Did?domain=protect-au.mimecast.com/ articles (either from Wikipedia or via Wikidata) and comparing it to another dataset (possibly *https://honours.pmc.gov.au/honours/search https://protect-au.mimecast.com/s/PMNTCGv097U2jAlXcQ_ELf?domain=honours.pmc.gov.au)*, cleaning and coding data and, importantly, visualising the data using mapping and other visualisation tools. This is a pilot project with resources for a few days work which we would ideally like to happen over the next month. Experience with Wikimedia data analysis is a plus. Please contact Heather for more information or to express interest.

---------------------------

Dr Heather Ford Associate Professor and Head of Discipline (Digital and Social Media https://www.uts.edu.au/future-students/communication/digital-and-social-media ) School of Communication https://www.uts.edu.au/future-students/communication/about-communication/welcome-school-communication, University of Technology, Sydney https://www.uts.edu.au/ (UTS)

w: hblog.org / t: @hfordsa http://www.twitter.com/hfordsa

...

1622

Age (days ago)

1629

Last active (days ago)

wikidata@lists.wikimedia.org

16 comments

12 participants

tags (0)

participants (12)

David Causse
Goran Milovanovic
Guillaume Lederrey
Hay (Husky)
Heather Ford
James Heald
Keegan Peterzell
Ramsey Isler
Robert Fernandez
Samuel Klein
Thomas Pellissier Tanon
Tito Dutta