I'm happy to announce a new mirror for datasets other than the XML dumps.
This mirror comes to us courtesy of the Center for Research Computing,
University of Notre Dame, and covers everything "other" [1] which includes
such goodies as Wikidata entity dumps, pageview counts, titles of all files
on each wiki (daily), titles of all articles of each wiki (daily), and the
so-called "adds-changes" dumps, among other things. You can access it at
http://wikimedia.crc.nd.edu/other/ so please do!
Ariel
[1] https://dumps.wikimedia.org/other/
No dumps were transferred at the beginning month until I complained on the
wmflabs mailing list.. Dumps started, but only those that had finished
were transferred. Other dumps, such as the still processing enwiki were
never transferred.
No dumps from the mid-month runs have transferred.
Strangely, enwiki and wikidata are processing. That wasn't supposed to
happen? This will delay beginning month's dump.
Asking two Greeks for help. Oh, why have I debased myself like this?
Bryan
Hi,
I'm experimenting on revision history dumps of Wikipedia. I've confusion
regarding archiving of Portal:Current Events in revision history dumps.
Do revisions of this portal exists in dumps or not? I could not find it.
Looking forward for your response.
Best regards,
Govind
Hello folks,
I know a number of you have subscribed to the Dumps Rewrite project (
https://phabricator.wikimedia.org/tag/dumps-rewrite/) but I bet none of you
actually watch it or any of its tasks. So here's a heads up.
I'm getting started on work on the job scheduler/workflow manager piece;
this would accept lists of dump tasks (in the current setup, "dump stubs
for el wikipedia"), call a callback to turn each of them into small jobs
that can be completed in less than an hour, submit and monitor these jobs
with retries, dependencies etc, call a callback to recombine the outputs of
the jobs, and notify some caller on success of te whole operation.
First up is evaluating existing packages and choosing one to use as a
foundation. Please contribute! See the following tasks:
https://phabricator.wikimedia.org/T143205: Draft usage scenarios for
job/workflow manager <https://phabricator.wikimedia.org/T143205>
https://phabricator.wikimedia.org/T143206: List requirements needed for
task/job/workflow manager <https://phabricator.wikimedia.org/T143206>
https://phabricator.wikimedia.org/T143207: Evaluate software packages for
job/task/workflow management <https://phabricator.wikimedia.org/T143207>
Also, can someone please forward this on to analytics-l and research-l?
I'm not on those lists but they will no doubt have a lot of useful
expertise here.
Thanks!
Ariel