Hello,
I am writing this message because I am analyzing the Wikidata JSON dumps
available in the Internet Archive and I have found there are no dumps
available after Feb 8th, 2019 (see
https://archive.org/details/wikimediadownloads?and%5B%5D=%22Wikidata%20enti…).
I know the latest dumps are available at
https://dumps.wikimedia.org/wikidatawiki/entities/, but unfortunately
they only cover the last few months.
I also noticed some gaps in the years where there are JSON dumps
available. For example, there are no JSON dumps available between end of
Feb, 2017 and Aug 21st, 2017; or between August 21st, 2017 and Nov 16, 2017.
Another strange finding is that while there are some entries for the
dumps in the Internet Archive between March 19th, 2018 and Nov 26th,
2018 (e.g., https://archive.org/details/wikibase-wikidatawiki-20181104),
none of them contain a JSON dump. That's another gap of more than 8 months.
Does anyone on this list know where some of these missing Wikidata dumps
may be found? If anyone has pointers to a server where they can be
downloaded, I would highly appreciate it.
Thanks in advance,
Daniel
Hello folks,
I hope everyone is in good health and staying safe in these troubled times.
Speaking of trouble, in the course of making an improvement to the xml/sql
dumps, I introduced a bug, and so now I am doing the cleanup from that.
The short version:
There will be a 7z file missing from the wikidata full page content dumps,
to be made available in a day or two.
The corresponding bz2 file should become available later today, but it's
possible that I will instead provide a slightly longer file which has a bad
bz2 block on the end, and pages at the end larger than are specified in the
filename. This would mean MANUAL PROCESSING IF YOU USE these full page
content dumps. If this happens, I'll send an email update.
The long version:
See https://phabricator.wikimedia.org/T268333
IN ALL CASES the xml/dumps run for the 20th of the month (ie. today) should
start late tonight UTC time, if not earlier.
My apologies for the inconvenience!
Ariel
Por favor,no me envien más notificaciones,pues quiero darme de baja del servicio de wikimedia.Gracias.
Obtener Outlook para Android<https://aka.ms/ghei36>
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20201001 full revision history content run.
We are currently dumping 922 projects in total.
---------------------
Stats for grwikimedia on date 20201001
Total size of page content dump files for articles, current content only:
5,990
Total size of page content dump files for all pages, current content only:
5,990
Total size of page content dump files for all pages, all revisions:
8,786
---------------------
Stats for enwiki on date 20201001
Total size of page content dump files for articles, current content only:
79,784,363,895
Total size of page content dump files for all pages, current content only:
176,888,031,105
Total size of page content dump files for all pages, all revisions:
21,518,142,611,381
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector