Hello,
I apologize if I missed something, but why the current JSON dump size is ~25GB while a week ago it was ~58GB? (see https://dumps.wikimedia.org/wikidatawiki/entities/20190617/)
Regards, Vladimir
Follow-up: according to my processing script, this dump contains only 30280591 entries, while the main page is still advertising 57M+ data items. Isn't it a bug in the dump process?
Regards, Vladimir
пн, 24 июн. 2019 г. в 19:37, Vladimir Ryabtsev greatvovan@gmail.com:
Hello,
I apologize if I missed something, but why the current JSON dump size is ~25GB while a week ago it was ~58GB? (see https://dumps.wikimedia.org/wikidatawiki/entities/20190617/)
Regards, Vladimir
Hi!
Follow-up: according to my processing script, this dump contains only 30280591 entries, while the main page is still advertising 57M+ data items. Isn't it a bug in the dump process?
There was a problem with dump script (since fixed), so the dump may indeed be broken. CCing Ariel to take a look. Probably needs to be re-run or we can just wait for the next one.
Which script, please, and which dump? (The conversation was not forwarded so I don't have the context.)
On Wed, Jun 26, 2019 at 3:39 AM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Follow-up: according to my processing script, this dump contains only 30280591 entries, while the main page is still advertising 57M+ data items. Isn't it a bug in the dump process?
There was a problem with dump script (since fixed), so the dump may indeed be broken. CCing Ariel to take a look. Probably needs to be re-run or we can just wait for the next one.
-- Stas Malyshev smalyshev@wikimedia.org
Hi!
Which script, please, and which dump? (The conversation was not forwarded so I don't have the context.)
Sorry, the original complaint was:
I apologize if I missed something, but why the current JSON dump size
is ~25GB while a week ago it was ~58GB? (see https://dumps.wikimedia.org/wikidatawiki/entities/20190617/)
But looking at it now, I see wikidata-20190617-all.json.gz is comparable with the last week, so looks like it's fine now?
I think the issue is with the 0624 json dumps, which do seem a lot smaller than previous weeks' runs.
On Wed, Jun 26, 2019 at 8:22 AM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Which script, please, and which dump? (The conversation was not forwarded so I don't have the context.)
Sorry, the original complaint was:
I apologize if I missed something, but why the current JSON dump size
is ~25GB while a week ago it was ~58GB? (see https://dumps.wikimedia.org/wikidatawiki/entities/20190617/)
But looking at it now, I see wikidata-20190617-all.json.gz is comparable with the last week, so looks like it's fine now?
-- Stas Malyshev smalyshev@wikimedia.org
Hi!
On 6/25/19 11:17 PM, Ariel Glenn WMF wrote:
I think the issue is with the 0624 json dumps, which do seem a lot smaller than previous weeks' runs.
Ah, true, I didn't realize that. I think this may be because of that dumpJson.php issue, which is now fixed. Maybe rerun the dump?
Let's take this to the filed task, https://phabricator.wikimedia.org/T226601
On Wed, Jun 26, 2019 at 9:22 AM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
On 6/25/19 11:17 PM, Ariel Glenn WMF wrote:
I think the issue is with the 0624 json dumps, which do seem a lot smaller than previous weeks' runs.
Ah, true, I didn't realize that. I think this may be because of that dumpJson.php issue, which is now fixed. Maybe rerun the dump?
-- Stas Malyshev smalyshev@wikimedia.org