Different keys can still be found in the actual xml dump wikidatawiki-20141009-pages-articles.xml.bz2.
This bug/feature is also present in the current dump with history.
page_id wd_id keys 111 Q15 ['aliases', 'claims', 'descriptions', 'id', 'labels', 'sitelinks', 'type'] 137 Q24 ['aliases', 'claims', 'description', 'entity', 'label', 'links'] 31500 Q28119 ['aliases', 'description', 'entity', 'label', 'links'] 225144 ? ['entity', 'redirect'] 3916689 P6 ['aliases', 'claims', 'datatype', 'descriptions', 'id', 'labels', 'type'] 3916937 P10 ['aliases', 'claims', 'datatype', 'description', 'entity', 'label']
Lukas
Am Do 09.10.2014 19:32, schrieb Lydia Pintscher:
On Thu, Oct 9, 2014 at 3:19 PM, Magnus Manske magnusmanske@googlemail.com wrote:
I managed to do the task at hand by switching to JSON dumps (because that's the new, officially supported, long-term-stable Wikidata dump format, right? Right???), so no hurry there.
Maybe the XML dump process was run in the middle of the switch to the new format, or got a stale cache for some items?
It looks like the switch happened in the middle of a dump creation so this one is half old and half new format mixed. The ones after that should be all new format. And yay for switching to JSON!
Cheers Lydia
Hoi, Is this dump going to be cleaned up? Will the next dump be good? Why did this go wrong? Thanks, GerardM
On 21 October 2014 17:02, Lukas Benedix lukas.benedix@fu-berlin.de wrote:
Different keys can still be found in the actual xml dump wikidatawiki-20141009-pages-articles.xml.bz2.
This bug/feature is also present in the current dump with history.
page_id wd_id keys 111 Q15 ['aliases', 'claims', 'descriptions', 'id', 'labels', 'sitelinks', 'type'] 137 Q24 ['aliases', 'claims', 'description', 'entity', 'label', 'links'] 31500 Q28119 ['aliases', 'description', 'entity', 'label', 'links'] 225144 ? ['entity', 'redirect'] 3916689 P6 ['aliases', 'claims', 'datatype', 'descriptions', 'id', 'labels', 'type'] 3916937 P10 ['aliases', 'claims', 'datatype', 'description', 'entity', 'label']
Lukas
Am Do 09.10.2014 19:32, schrieb Lydia Pintscher:
On Thu, Oct 9, 2014 at 3:19 PM, Magnus Manske magnusmanske@googlemail.com wrote:
I managed to do the task at hand by switching to JSON dumps (because
that's
the new, officially supported, long-term-stable Wikidata dump format,
right?
Right???), so no hurry there.
Maybe the XML dump process was run in the middle of the switch to the
new
format, or got a stale cache for some items?
It looks like the switch happened in the middle of a dump creation so this one is half old and half new format mixed. The ones after that should be all new format. And yay for switching to JSON!
Cheers Lydia
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 22.10.2014 07:29, schrieb Gerard Meijssen:
Hoi, Is this dump going to be cleaned up? Will the next dump be good? Why did this go wrong?
Frankly, we have no idea why this is going wrong. I cannot reproduce the problem locally, and it seems to work fine with Special:Export.
Dump generation is a bit strange and wonderful, and few people actually know in detail how it works on the live cluster. I vaguely remember that at one point, only new revisions were dumped, and the result "stitched" into old dumps. That would explain the issue - and it would be something we cannot fix on the Wikibase side. I'm trying to get hold of someone who can confirm/fix this.
I have filed https://bugzilla.wikimedia.org/show_bug.cgi?id=72348 so this gets tracked. I'll also bring it up in our next call with the foundation.
-- daniel
Hi Lukas!
That really shouldn't happen...
Can you tell me on which item that happens? Also, please double-check the namespace and content model of the respective entry in the dump.
-- daniel
Am 21.10.2014 17:02, schrieb Lukas Benedix:
Different keys can still be found in the actual xml dump wikidatawiki-20141009-pages-articles.xml.bz2.
This bug/feature is also present in the current dump with history.
page_id wd_id keys 111 Q15 ['aliases', 'claims', 'descriptions', 'id', 'labels', 'sitelinks', 'type'] 137 Q24 ['aliases', 'claims', 'description', 'entity', 'label', 'links'] 31500 Q28119 ['aliases', 'description', 'entity', 'label', 'links'] 225144 ? ['entity', 'redirect'] 3916689 P6 ['aliases', 'claims', 'datatype', 'descriptions', 'id', 'labels', 'type'] 3916937 P10 ['aliases', 'claims', 'datatype', 'description', 'entity', 'label']
Lukas
Am Do 09.10.2014 19:32, schrieb Lydia Pintscher:
On Thu, Oct 9, 2014 at 3:19 PM, Magnus Manske magnusmanske@googlemail.com wrote:
I managed to do the task at hand by switching to JSON dumps (because that's the new, officially supported, long-term-stable Wikidata dump format, right? Right???), so no hurry there.
Maybe the XML dump process was run in the middle of the switch to the new format, or got a stale cache for some items?
It looks like the switch happened in the middle of a dump creation so this one is half old and half new format mixed. The ones after that should be all new format. And yay for switching to JSON!
Cheers Lydia
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 22.10.2014 11:06, schrieb Daniel Kinzler:
Hi Lukas!
That really shouldn't happen...
Can you tell me on which item that happens? Also, please double-check the namespace and content model of the respective entry in the dump.
Never mind, I found it in the dump. Can't reproduce, though. Strange.
Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=72348
-- daniel