Dear ones,
Where might I get or mirror a dump of Commons media files?
> It seems worth mentioning on the front page of
https://dumps.wikimedia.org/
> It looks like the compressed XML of the ~50M description pages is ~25GB.
> It looks like wiki-team set up a dump script that posted monthly dumps to
the internet archive; in 2013 it stopped include the month+year in the
title; in 2016 it stopped altogether.
https://archive.org/details/wikimediacommons
This was initially rejected, sorry about that.
---------- Forwarded message ---------
From: Guillaume Lederrey <glederrey(a)wikimedia.org>
Date: Wed, Jul 22, 2020 at 1:03 PM
Subject: [Wikidata] Wikimedia Commons Query Service (WCQS)
To: Discussion list for the Wikidata project. <wikidata(a)lists.wikimedia.org>,
<commons-l(a)lists.wikimedia.org>
Hello all!
We are happy to announce the availability of Wikimedia Commons Query
Service (WCQS): https://wcqs-beta.wmflabs.org/.
This is a beta SPARQL endpoint exposing the Structured Data on Commons
(SDoC) dataset. This endpoint can federate with WDQS. More work is needed
as we iterate on the service, but feel free to begin using the endpoint.
Known limitations are listed below:
* The service is a beta endpoint that is updated via weekly dumps. Some
caveats include limited performance, expected downtimes, and no interface,
naming, or backward compatibility stability guarantees.
* The service is hosted on Wikimedia Cloud Services, with limited
resources and limited monitoring. This means there may be random unplanned
downtime.
The data will be reloaded weekly from dumps. The service will be down
during data reload. With the current amount of SDoC data, downtime will
last approximately 4 hours, but this may increase as SDoC data grows.
* Due to an issue with the dump format, the data currently only dates
back to July 5th. We’re working on getting more up-to-date data and hope to
have a solution soon. (https://phabricator.wikimedia.org/T258507 and
https://phabricator.wikimedia.org/T258474)
* The MediaInfo concept URIs (e.g.
http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may
change these to HTTPS in the near future. Please comment on T258590 if you
have concerns about this change.
* The service is restricted behind OAuth authentication, backed by Commons.
You will need an account on Commons to access the service. This is so that
we can contact abusive bots and/or users and block them selectively as a
last resort if needed.
* Please note that to correctly logout of the service, you need to use
the logout link in WCQS - logging out of just Wikimedia Commons will not
work for WCQS. This limitation will be lifted once we move to production.
* No documentation on the service is available yet. In particular, no
examples are provided yet. You can add your own examples at
https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exa…
following the format at
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
.
* Please use the SPARQL template. Note that while there is currently a
bug that doesn’t allow us to change the “Try it!” link endpoint, the
examples will be displayed correctly on the WCQS GUI.
* WCQS is a work in progress and some bugs are to be expected, especially
related to generalizing WDQS to fit SDoC data. For example, current bugs
include:
* URI prefixes specific for SDoC data don’t yet work - you need to use
full URIs if you want to query using them. Relations and Q items are
defined by Wikidata’s URI prefixes, so they work correctly.
* Autocomplete for SDoC items doesn’t work - without prefixes they’d be
unusable anyway, but additional work will be required after we inject SDoC
URI prefixes into WCQS GUI.
* If you find any additional bugs or issues, please report them via
Phabricator with the tag wikidata-query-service.
* We do plan to move the service to production, but we don’t have a
timeline on that yet. We want to emphasize that while we do expect a SPARQL
endpoint to be part of a medium to long-term solution, it will only be part
of that solution. Even once the service is production-ready, it will still
have limitations in terms of timeouts, expensive queries, and federation.
Some use cases will need to be migrated, over time, to better solutions -
once those solutions exist.
Have fun!
Guillaume
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Keegan Peterzell (he/him)
Community Relations Specialist
Wikimedia Foundation
"Try it" issue worked around.
Include the line
|project=sdc
when calling the SPARQL template, to have the query go to the WCQS beta
service
The template
https://commons.wikimedia.org/wiki/Template:SPARQL
will need to be updated to point to the final endpoint when this is
available.
The "Try it" links at
https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exa…
should now all work.
-- James.
On 22/07/2020 19:02, Guillaume Lederrey wrote:
> Hello all!
>
> We are happy to announce the availability of Wikimedia Commons Query
> Service (WCQS): https://wcqs-beta.wmflabs.org/.
>
> This is a beta SPARQL endpoint exposing the Structured Data on Commons
> (SDoC) dataset. This endpoint can federate with WDQS. More work is needed
> as we iterate on the service, but feel free to begin using the endpoint.
> Known limitations are listed below:
>
> * The service is a beta endpoint that is updated via weekly dumps. Some
> caveats include limited performance, expected downtimes, and no interface,
> naming, or backward compatibility stability guarantees.
> * The service is hosted on Wikimedia Cloud Services, with limited
> resources and limited monitoring. This means there may be random unplanned
> downtime.
> The data will be reloaded weekly from dumps. The service will be down
> during data reload. With the current amount of SDoC data, downtime will
> last approximately 4 hours, but this may increase as SDoC data grows.
> * Due to an issue with the dump format, the data currently only dates
> back to July 5th. We’re working on getting more up-to-date data and hope to
> have a solution soon. (https://phabricator.wikimedia.org/T258507 and
> https://phabricator.wikimedia.org/T258474)
> * The MediaInfo concept URIs (e.g.
> http://commons.wikimedia.org/entity/M37200540) are currently HTTP; we may
> change these to HTTPS in the near future. Please comment on T258590 if you
> have concerns about this change.
>
> * The service is restricted behind OAuth authentication, backed by Commons.
> You will need an account on Commons to access the service. This is so that
> we can contact abusive bots and/or users and block them selectively as a
> last resort if needed.
> * Please note that to correctly logout of the service, you need to use
> the logout link in WCQS - logging out of just Wikimedia Commons will not
> work for WCQS. This limitation will be lifted once we move to production.
>
> * No documentation on the service is available yet. In particular, no
> examples are provided yet. You can add your own examples at
> https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/queries/exa…
> following the format at
> https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
> .
> * Please use the SPARQL template. Note that while there is currently a
> bug that doesn’t allow us to change the “Try it!” link endpoint, the
> examples will be displayed correctly on the WCQS GUI.
>
> * WCQS is a work in progress and some bugs are to be expected, especially
> related to generalizing WDQS to fit SDoC data. For example, current bugs
> include:
> * URI prefixes specific for SDoC data don’t yet work - you need to use
> full URIs if you want to query using them. Relations and Q items are
> defined by Wikidata’s URI prefixes, so they work correctly.
> * Autocomplete for SDoC items doesn’t work - without prefixes they’d be
> unusable anyway, but additional work will be required after we inject SDoC
> URI prefixes into WCQS GUI.
> * If you find any additional bugs or issues, please report them via
> Phabricator with the tag wikidata-query-service.
> * We do plan to move the service to production, but we don’t have a
> timeline on that yet. We want to emphasize that while we do expect a SPARQL
> endpoint to be part of a medium to long-term solution, it will only be part
> of that solution. Even once the service is production-ready, it will still
> have limitations in terms of timeouts, expensive queries, and federation.
> Some use cases will need to be migrated, over time, to better solutions -
> once those solutions exist.
>
> Have fun!
>
> Guillaume
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+1 / CET
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
As a followup, I'll be posting about this beta launch on various places
across the wikis over the next few days. Please keep an eye out for
discussions around the tool and opportunities to write up documentation on
Commons.
--
Keegan Peterzell (he/him)
Community Relations Specialist
Wikimedia Foundation
Hello,
Today the Wikimedia Foundation would like to introduce a new community
blog. It's called "Diff" (diff.wikimedia.org) and is a blog by – and
for – the Wikimedia volunteer community to connect and share
learnings, stories, and ideas from across our movement. We'd like to
encourage you to learn more about Diff and how it can help you in
sharing and learning from your fellow Wikimedians.
Everyone is invited to contribute!
https://diff.wikimedia.org/2020/07/14/welcome-to-diff-a-community-blog-for-…
The name “Diff” is in reference to the wiki interface that displays
the difference between one version and another of a Wikipedia page. It
also reflects the “difference” our communities and movement make in
the world every day.
For some background, Diff builds on lessons and experiences from the
Wikimedia Blog, the Wikimedia Foundation News, and Wikimedia Space;
previous posts from these channels are archived on Diff. The channel
is primarily intended for community-authored posts, in which
volunteers can share their stories, learnings, and ideas with each
other.
Diff offers a simple and accessible editorial process, moderated by
Foundation communications staff and open to volunteers, to encourage
participation from all — especially emerging and under-represented
communities. Additionally, content on Diff can be written and
translated into languages to reach a wide audience. Diff also has a
code of conduct and comments can be flagged and moderated.
Still curious to learn more?
https://diff.wikimedia.org/2020/07/14/welcome-to-diff-a-community-blog-for-…
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation