Extracted page abstracts for Yahoo

List overview All Threads
Download

newer

older

all dumps now running out of...

space issues on one of the dump...

Andreas Meier

29 Jul 2013 29 Jul '13

3:18 a.m.

Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.

An other question: Why is this step not parallelized?

Best regards Andreas Meier

Show replies by date

Ariel T. Glenn

29 Jul 29 Jul

6:14 p.m.

Στις 28-07-2013, ημέρα Κυρ, και ώρα 22:48 +0200, ο/η Andreas Meier έγραψε:

...

Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.

I'll check into that.

...

An other question: Why is this step not parallelized?

Cause I was too lazy to do so. It doesn't suck up that much time compared to the other steps.

Ariel T. Glenn

6 Aug 6 Aug

2 p.m.

...

Στις 28-07-2013, ημέρα Κυρ, και ώρα 22:48 +0200, ο/η Andreas Meier έγραψε:

...
Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.

I have parallelized these now. The latest es wp run has this change and the files look ok.

Ariel

Federico Leva (Nemo)

29 Jul 29 Jul

7:08 p.m.

Andreas Meier, 28/07/2013 22:48:

...

Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.

An other question: Why is this step not parallelized?

Sorry, I can't answer your questions but I have one for you: as you are interested in these abstracts, could you please add a line about them and the use case(s) for them to https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download? Thank you! Apart from what their name tells, these files has always been quite mysterious.

Nemo

4154

Age (days ago)

4163

Last active (days ago)

xmldatadumps-l@lists.wikimedia.org

3 comments

3 participants

tags (0)

participants (3)

Andreas Meier
Ariel T. Glenn
Federico Leva (Nemo)