Hi,
I noticed that there is considerable delay between the weekly Wikidata JSON dump appearing online and the file appearing on the Labs servers [1]. For example, the 20160502 dump is online right now, but there is only an empty directory for this date on Labs.
In retrospect, file modification dates on Labs give the appearance that the files have been around earlier than they seem to be, but they have not been available at this time last week either. As it is now, it is faster to download the dump instead of waiting for the file to show up in the central location, but it's probably not intended that each tool gets its own copy. For a weekly dump, half a day of delay is significant.
Any ideas (including whom to ask)?
Cheers,
Markus
[1] Under /public/dumps/public/wikidatawiki/entities/
Pushing this up a bit again. The 9 May dump is not available on labs yet. There is just the empty directory
/public/dumps/public/wikidatawiki/entities/20160509/
I really wonder why it might be taking so long.
Markus
On 02.05.2016 21:36, Markus Kroetzsch wrote:
Hi,
I noticed that there is considerable delay between the weekly Wikidata JSON dump appearing online and the file appearing on the Labs servers [1]. For example, the 20160502 dump is online right now, but there is only an empty directory for this date on Labs.
In retrospect, file modification dates on Labs give the appearance that the files have been around earlier than they seem to be, but they have not been available at this time last week either. As it is now, it is faster to download the dump instead of waiting for the file to show up in the central location, but it's probably not intended that each tool gets its own copy. For a weekly dump, half a day of delay is significant.
Any ideas (including whom to ask)?
Cheers,
Markus
[1] Under /public/dumps/public/wikidatawiki/entities/
This time it took quite long to produce the dump in the first place (until after 8pm UTC for the gzip version, the bzip2 one didn't even finish until Tuesday).
I presume that is due to one of the shards picking a slow database slave which significantly slows that shard down. We should get new database slaves soon, thus I presume that this problem is going to disappear soon.
Cheers,
Marius On 10.05.2016 12:05, Markus Krötzsch wrote:
Pushing this up a bit again. The 9 May dump is not available on labs yet. There is just the empty directory
/public/dumps/public/wikidatawiki/entities/20160509/
I really wonder why it might be taking so long.
Markus
On 02.05.2016 21:36, Markus Kroetzsch wrote:
Hi,
I noticed that there is considerable delay between the weekly Wikidata JSON dump appearing online and the file appearing on the Labs servers [1]. For example, the 20160502 dump is online right now, but there is only an empty directory for this date on Labs.
In retrospect, file modification dates on Labs give the appearance that the files have been around earlier than they seem to be, but they have not been available at this time last week either. As it is now, it is faster to download the dump instead of waiting for the file to show up in the central location, but it's probably not intended that each tool gets its own copy. For a weekly dump, half a day of delay is significant.
Any ideas (including whom to ask)?
Cheers,
Markus
[1] Under /public/dumps/public/wikidatawiki/entities/
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
On 11.05.2016 08:28, Marius Hoch wrote:
This time it took quite long to produce the dump in the first place (until after 8pm UTC for the gzip version, the bzip2 one didn't even finish until Tuesday).
I presume that is due to one of the shards picking a slow database slave which significantly slows that shard down. We should get new database slaves soon, thus I presume that this problem is going to disappear soon.
Cheers,
Marius
That's alright, I was not actually worried about slow dump generation. What I noticed was that the dumps are available online many hours before they appear on Labs. I would like to use the central dump on labs instead of downloading my own copy each time, but right now this delays dump processing further. I was wondering who is providing the central entity dumps on labs.
Cheers,
Markus
On 10.05.2016 12:05, Markus Krötzsch wrote:
Pushing this up a bit again. The 9 May dump is not available on labs yet. There is just the empty directory
/public/dumps/public/wikidatawiki/entities/20160509/
I really wonder why it might be taking so long.
Markus
On 02.05.2016 21:36, Markus Kroetzsch wrote:
Hi,
I noticed that there is considerable delay between the weekly Wikidata JSON dump appearing online and the file appearing on the Labs servers [1]. For example, the 20160502 dump is online right now, but there is only an empty directory for this date on Labs.
In retrospect, file modification dates on Labs give the appearance that the files have been around earlier than they seem to be, but they have not been available at this time last week either. As it is now, it is faster to download the dump instead of waiting for the file to show up in the central location, but it's probably not intended that each tool gets its own copy. For a weekly dump, half a day of delay is significant.
Any ideas (including whom to ask)?
Cheers,
Markus
[1] Under /public/dumps/public/wikidatawiki/entities/
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
On Wed, May 11, 2016 at 9:17 AM Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 11.05.2016 08:28, Marius Hoch wrote:
This time it took quite long to produce the dump in the first place (until after 8pm UTC for the gzip version, the bzip2 one didn't even finish until Tuesday).
I presume that is due to one of the shards picking a slow database slave which significantly slows that shard down. We should get new database slaves soon, thus I presume that this problem is going to disappear soon.
Cheers,
Marius
That's alright, I was not actually worried about slow dump generation. What I noticed was that the dumps are available online many hours before they appear on Labs. I would like to use the central dump on labs instead of downloading my own copy each time, but right now this delays dump processing further. I was wondering who is providing the central entity dumps on labs.
Adam, Marius: Do either of you know why there is this delay and if there is anything we can do about it?
Cheers Lydia
As far as I remember the sync only happens once a day. Depending on when the dump creation finishes, this means it showing up on labs can be severely delayed. If the dumps not being up to date is an issue, I'd rather suggest to just do two dumps a week, dump creation is cheap. Cheers,
Marius
On 18.05.2016 10:56, Lydia Pintscher wrote:
On Wed, May 11, 2016 at 9:17 AM Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 11.05.2016 08:28, Marius Hoch wrote: > This time it took quite long to produce the dump in the first place > (until after 8pm UTC for the gzip version, the bzip2 one didn't even > finish until Tuesday). > > I presume that is due to one of the shards picking a slow database slave > which significantly slows that shard down. We should get new database > slaves soon, thus I presume that this problem is going to disappear soon. > > Cheers, > > Marius That's alright, I was not actually worried about slow dump generation. What I noticed was that the dumps are available online many hours before they appear on Labs. I would like to use the central dump on labs instead of downloading my own copy each time, but right now this delays dump processing further. I was wondering who is providing the central entity dumps on labs.
Adam, Marius: Do either of you know why there is this delay and if there is anything we can do about it?
Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de http://www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
wikidata-tech@lists.wikimedia.org