Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.
An other question: Why is this step not parallelized?
Best regards Andreas Meier
Στις 28-07-2013, ημέρα Κυρ, και ώρα 22:48 +0200, ο/η Andreas Meier έγραψε:
Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.
I'll check into that.
An other question: Why is this step not parallelized?
Cause I was too lazy to do so. It doesn't suck up that much time compared to the other steps.
Στις 28-07-2013, ημέρα Κυρ, και ώρα 22:48 +0200, ο/η Andreas Meier έγραψε:
Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.
I have parallelized these now. The latest es wp run has this change and the files look ok.
Ariel
Andreas Meier, 28/07/2013 22:48:
Hello, there is a problem with the extracted page abstracts for Yahoo on the big wikis moved to the new infrastructure. During generation everything seems to be fine, but it ended with a 159kb file.
An other question: Why is this step not parallelized?
Sorry, I can't answer your questions but I have one for you: as you are interested in these abstracts, could you please add a line about them and the use case(s) for them to https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download? Thank you! Apart from what their name tells, these files has always been quite mysterious.
Nemo
xmldatadumps-l@lists.wikimedia.org