For folks who have not been following the saga on http://wikitech.wikimedia.org/view/Dataset1 we were able to get the raid array back in service last night on the XML data dumps server, and we are now busily copying data off of it to another host. There's about 11T of dumps to copy over; once that's done we will start serving these dumps read-only to the public again. Because the state of the server hardware is still uncertain, we don't want to do anything that might put the data at risk until that copy has been made.
The replacement server is on order and we are watching that closely.
We have also been working on deploying a server to run one round of dumps in the interrim.
Thanks for your patience (which is a way of saying, I know you are all out of patience, as am I, but hang on just a little longer).
Ariel
Thanks Ariel and team. Some not too bad news at last.
D
On Tue, Dec 14, 2010 at 9:12 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
For folks who have not been following the saga on http://wikitech.wikimedia.org/view/Dataset1 we were able to get the raid array back in service last night on the XML data dumps server, and we are now busily copying data off of it to another host. There's about 11T of dumps to copy over; once that's done we will start serving these dumps read-only to the public again. Because the state of the server hardware is still uncertain, we don't want to do anything that might put the data at risk until that copy has been made.
The replacement server is on order and we are watching that closely.
We have also been working on deploying a server to run one round of dumps in the interrim.
Thanks for your patience (which is a way of saying, I know you are all out of patience, as am I, but hang on just a little longer).
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks.
Double good news: http://lists.wikimedia.org/pipermail/foundation-l/2010-December/063088.html
2010/12/14 Ariel T. Glenn ariel@wikimedia.org
For folks who have not been following the saga on http://wikitech.wikimedia.org/view/Dataset1 we were able to get the raid array back in service last night on the XML data dumps server, and we are now busily copying data off of it to another host. There's about 11T of dumps to copy over; once that's done we will start serving these dumps read-only to the public again. Because the state of the server hardware is still uncertain, we don't want to do anything that might put the data at risk until that copy has been made.
The replacement server is on order and we are watching that closely.
We have also been working on deploying a server to run one round of dumps in the interrim.
Thanks for your patience (which is a way of saying, I know you are all out of patience, as am I, but hang on just a little longer).
Ariel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
We now have a copy of the dumps on a backup host. Although we are still resolving hardware issues on the XML dumps server, we think it is safe enough to serve the existing dumps read-only. DNS was updated to that effect already; people should see the dumps within the hour.
Ariel
Good work.
2010/12/15 Ariel T. Glenn ariel@wikimedia.org
We now have a copy of the dumps on a backup host. Although we are still resolving hardware issues on the XML dumps server, we think it is safe enough to serve the existing dumps read-only. DNS was updated to that effect already; people should see the dumps within the hour.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Yeah, great work Ariel. Thanks a lot for the effort.
Best, F.
--- El mié, 15/12/10, Ariel T. Glenn ariel@wikimedia.org escribió:
De: Ariel T. Glenn ariel@wikimedia.org Asunto: Re: [Xmldatadumps-l] dataset1, xml dumps Para: wikitech-l@lists.wikimedia.org CC: xmldatadumps-l@lists.wikimedia.org Fecha: miércoles, 15 de diciembre, 2010 20:57 We now have a copy of the dumps on a backup host. Although we are still resolving hardware issues on the XML dumps server, we think it is safe enough to serve the existing dumps read-only. DNS was updated to that effect already; people should see the dumps within the hour.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Google donated storage space for backups for XML dumps. Accordingly, a copy of the latest complete dump for each project is being copied over (public files only). We expect to run similar copies once every two weeks, keeping the four latest copies as well as one permanent copy at every six month interval. That can be adjusted as we see how things go.
Ariel
Ariel T. Glenn wrote:
Google donated storage space for backups for XML dumps. Accordingly, a copy of the latest complete dump for each project is being copied over (public files only). We expect to run similar copies once every two weeks, keeping the four latest copies as well as one permanent copy at every six month interval. That can be adjusted as we see how things go.
Ariel
Are they readable from somewhere? Apparently, in order to read them you need to sign up a list and wait for an invitation, available only for US developers.
I sent mail immediately after my initiual mail to these lists, to find out whether we can make them readable to the public and whether there would be a fee, etc. As soon as I have more information, I will pass it on. At the least this gives WMF one more copy. Of course it would be best if it gave everyone one more copy.
Ariel
Στις 20-12-2010, ημέρα Δευ, και ώρα 17:41 +0100, ο/η Platonides έγραψε:
Ariel T. Glenn wrote:
Google donated storage space for backups for XML dumps. Accordingly, a copy of the latest complete dump for each project is being copied over (public files only). We expect to run similar copies once every two weeks, keeping the four latest copies as well as one permanent copy at every six month interval. That can be adjusted as we see how things go.
Ariel
Are they readable from somewhere? Apparently, in order to read them you need to sign up a list and wait for an invitation, available only for US developers.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The new host Dataset2 is now up and running and serving XML dumps. Those of you paying attention to DNS entries should see the change within the hour. We are not generating new dumps yet but expect to do so soon.
Ariel
Hi,
That is great news, that you for all the hard work you have done on this and most of all Seasons Greetings, Merry Christmas, and Happy New Year! :)
best regards, Jamie
----- Original Message ----- From: "Ariel T. Glenn" ariel@wikimedia.org Date: Friday, December 24, 2010 10:42 am Subject: Re: [Xmldatadumps-l] [Wikitech-l] dataset1, xml dumps To: Wikimedia developers wikitech-l@lists.wikimedia.org Cc: xmldatadumps-l@lists.wikimedia.org
The new host Dataset2 is now up and running and serving XML dumps. Those of you paying attention to DNS entries should see the change within the hour. We are not generating new dumps yet but expect to do so soon.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
So "soon" took longer than I would have liked. However, we are up and running with the new code. I have started a few processes going and over the next few days I will ramp it up to the usual number. In particular I want to start a separate job for the larger wikis so that the smaller jobs don't get trapped behind them.
Guess I'd better go update the various pages on wikitech now.
Ariel
Στις 24-12-2010, ημέρα Παρ, και ώρα 20:42 +0200, ο/η Ariel T. Glenn έγραψε:
The new host Dataset2 is now up and running and serving XML dumps. Those of you paying attention to DNS entries should see the change within the hour. We are not generating new dumps yet but expect to do so soon.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
On 10/01/11 22:13, Ariel T. Glenn wrote:
So "soon" took longer than I would have liked. However, we are up and running with the new code. I have started a few processes going and over the next few days I will ramp it up to the usual number. In particular I want to start a separate job for the larger wikis so that the smaller jobs don't get trapped behind them.
Guess I'd better go update the various pages on wikitech now.
Ariel
Thanks Ariel, that's good to hear.
Would it be possible to take this a step further, and for a single job to be started up just for enwiki?
enwiki is unique among all the dumps in that it is the only one that regularly fails more often than it succeeds; even partial dumps are better than none, and enwiki also takes longer than any other dump before it (typically) fails, so retrying it more aggressively than others -- and independently of them, so it does not hold the other wikis up -- would seem appropriate.
Thus, under this proposal, there would be three jobs running:
* enwiki * other large wikis * all small wikis
-- Neil
Στις 11-01-2011, ημέρα Τρι, και ώρα 10:16 +0000, ο/η Neil Harris έγραψε:
On 10/01/11 22:13, Ariel T. Glenn wrote:
So "soon" took longer than I would have liked. However, we are up and running with the new code. I have started a few processes going and over the next few days I will ramp it up to the usual number. In particular I want to start a separate job for the larger wikis so that the smaller jobs don't get trapped behind them.
Guess I'd better go update the various pages on wikitech now.
Ariel
Thanks Ariel, that's good to hear.
Would it be possible to take this a step further, and for a single job to be started up just for enwiki?
enwiki is unique among all the dumps in that it is the only one that regularly fails more often than it succeeds; even partial dumps are better than none, and enwiki also takes longer than any other dump before it (typically) fails, so retrying it more aggressively than others -- and independently of them, so it does not hold the other wikis up -- would seem appropriate.
Thus, under this proposal, there would be three jobs running:
- enwiki
- other large wikis
- all small wikis
-- Neil
Ah yes, sorry that wasn't clear from the earlier message. I already pulled enwiki oout of the main list and it will run as a bunch of smaller parallel jobs on its own host.
Ariel
You may be noticing a "recombine" step for several files on the recent dumps which simply seems to list the same file again. That's a bug not a feature; fortunately it doesn't impact the files themselves. I have fixed the configuration file so that it should no longer claim to run these, as they are for the parallel run function which is not needed on the smaller wikis.
I'm thinking about whether or not to clean up the index.html and md5sums on these to remove the bogus lines. Doing the index files would be a bit tedious.
Ariel
Hi,
I guess they have some extra server space since they were told to erase the wifi data their street view cars were sniffing :D
cheers, Jamie
----- Original Message ----- From: "Ariel T. Glenn" ariel@wikimedia.org Date: Monday, December 20, 2010 3:22 am Subject: Re: [Xmldatadumps-l] [Wikitech-l] dataset1, xml dumps To: Wikimedia developers wikitech-l@lists.wikimedia.org Cc: xmldatadumps-l@lists.wikimedia.org
Google donated storage space for backups for XML dumps. Accordingly, a copy of the latest complete dump for each project is being copied over (public files only). We expect to run similar copies once every two weeks, keeping the four latest copies as well as one permanent copy at every six month interval. That can be adjusted as we see how things go.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org