Awesome, i'm really happy we finally have at least
a start of a
functioning query service.
For now, the two things that i guess would be helpful for most query
writers:
1) A way to make ImageGrid work without resorting to the clunky
Special:FilePath hack
to track this
request. Feel free to add any details on the task. No promise on when we'll
have time to work on it. We now need to focus again on the new streaming
updater for WDQS (and eventually for WCQS) and improving the stability /
scaling of WDQS.
It is unlikely that we'll be able to remove federation in this case. At
least it is unlikely that we'll merge those 2 graphs in a single Blazegraph
instance. Given the concerns we have about scaling this service, splitting
into more federated graphs seems a better option than merging into larger
graphs. There might be ways to pull a subset of the Wikidata dataset into
WCQS, but that looks like a complex problem.
I guess 2) might be a bit more difficult, but it might definitely be
something to consider.
Kind regards,
-- Hay
On Wed, Jul 22, 2020 at 8:03 PM Guillaume Lederrey
<glederrey(a)wikimedia.org> wrote:
Hello all!
We are happy to announce the availability of Wikimedia Commons Query
Service
(WCQS):
https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons
(SDoC)
dataset. This endpoint can federate with WDQS. More work is needed
as we iterate on the service, but feel free to begin using the endpoint.
Known limitations are listed below:
* The service is a beta endpoint that is updated via weekly dumps. Some
caveats
include limited performance, expected downtimes, and no interface,
naming, or backward compatibility stability guarantees.
* The service is hosted on Wikimedia Cloud
Services, with limited
resources and limited monitoring. This means there may be
random unplanned
downtime.
The data will be reloaded weekly from dumps. The
service will be down
during data reload. With the current amount of SDoC data,
downtime will
last approximately 4 hours, but this may increase as SDoC data grows.
* Due to an issue with the dump format, the
data currently only
dates back to July 5th. We’re working on getting more
up-to-date data and
hope to have a solution soon. (
https://phabricator.wikimedia.org/T258507
and
https://phabricator.wikimedia.org/T258474)
* The MediaInfo concept URIs (e.g.
http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may
change these to HTTPS in the near future. Please comment on T258590 if you
have concerns about this change.
* The service is restricted behind OAuth authentication, backed by
Commons. You
will need an account on Commons to access the service. This is
so that we can contact abusive bots and/or users and block them selectively
as a last resort if needed.
* Please note that to correctly logout of the
service, you need to
use the logout link in WCQS - logging out of just Wikimedia
Commons will
not work for WCQS. This limitation will be lifted once we move to
production.
* No documentation on the service is available yet. In particular, no
examples are
provided yet. You can add your own examples at
https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exa…
following the format at
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
.
* Please use the SPARQL template. Note that
while there is currently
a bug that doesn’t allow us to change the “Try it!” link
endpoint, the
examples will be displayed correctly on the WCQS GUI.
* WCQS is a work in progress and some bugs are to be expected,
especially related
to generalizing WDQS to fit SDoC data. For example,
current bugs include:
* URI prefixes specific for SDoC data don’t
yet work - you need to
use full URIs if you want to query using them. Relations and
Q items are
defined by Wikidata’s URI prefixes, so they work correctly.
* Autocomplete for SDoC items doesn’t work -
without prefixes they’d
be unusable anyway, but additional work will be required
after we inject
SDoC URI prefixes into WCQS GUI.
* If you find any additional bugs or issues,
please report them via
Phabricator with the tag wikidata-query-service.
* We do plan to move the service to production,
but we don’t have a
timeline on that yet. We want to emphasize that while we do
expect a SPARQL
endpoint to be part of a medium to long-term solution, it will only be part
of that solution. Even once the service is production-ready, it will still
have limitations in terms of timeouts, expensive queries, and federation.
Some use cases will need to be migrated, over time, to better solutions -
once those solutions exist.
Have fun!
Guillaume
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET