After the last batch of problems were fixed, enwiki was getting dumped almost weekly. But there was no dump in December, and since it seems to be held up dumping the full history with an ETA of Jan 22nd, it seems like it will be at least a few more weeks.
So I wanted to check on what kind of schedule to expect. Is this a temporary issue, so we can go back to frequent dumps again once this is done? Or were the frequent dumps a temporary thing, and this is the norm?
Would it be difficult to run the current version & full history concurrently on a separate schedule, to allow the current versions to be dumped more frequently?
Thanks -Steve
Steve Sanbeg wrote:
After the last batch of problems were fixed, enwiki was getting dumped almost weekly. But there was no dump in December, and since it seems to be held up dumping the full history with an ETA of Jan 22nd, it seems like it will be at least a few more weeks.
So I wanted to check on what kind of schedule to expect. Is this a temporary issue, so we can go back to frequent dumps again once this is done? Or were the frequent dumps a temporary thing, and this is the norm?
Would it be difficult to run the current version & full history concurrently on a separate schedule, to allow the current versions to be dumped more frequently?
Within the last year we've tried to schedule monthly releases at the very least. Most consumers are perfectly happy with that data set as the time required to go through the copy and process is long enough to where anything more recent is unnecessary.
Now with several fixes and the re compression of the text external storage we had gotten down to two week cycles for most if not all wikis.
That short release cycle was certainly attractive but not a hard requirement.
Since we've crept just over the month cycle I'm certainly tempted to run another non full history snapshot in order to catch up. Assuming its not too resource intensive.
--tomasz
On Mon, 04 Jan 2010 18:11:33 -0800, Tomasz Finc wrote:
Steve Sanbeg wrote:
After the last batch of problems were fixed, enwiki was getting dumped almost weekly. But there was no dump in December, and since it seems to be held up dumping the full history with an ETA of Jan 22nd, it seems like it will be at least a few more weeks.
So I wanted to check on what kind of schedule to expect. Is this a temporary issue, so we can go back to frequent dumps again once this is done? Or were the frequent dumps a temporary thing, and this is the norm?
Would it be difficult to run the current version & full history concurrently on a separate schedule, to allow the current versions to be dumped more frequently?
Within the last year we've tried to schedule monthly releases at the very least. Most consumers are perfectly happy with that data set as the time required to go through the copy and process is long enough to where anything more recent is unnecessary.
Now with several fixes and the re compression of the text external storage we had gotten down to two week cycles for most if not all wikis.
That short release cycle was certainly attractive but not a hard requirement.
Since we've crept just over the month cycle I'm certainly tempted to run another non full history snapshot in order to catch up. Assuming its not too resource intensive.
--tomasz
OK, thanks. It would definitely be useful for me if we can keep two week cycles most of the time, and not slip to more than a month. I was concerned that getting more frequent data was dependent on something else failing for the dump to be created more frequently, which would be a shame. Anyway, I'll keep checking for the next dump.
Thanks -Steve
Steve Sanbeg wrote:
On Mon, 04 Jan 2010 18:11:33 -0800, Tomasz Finc wrote:
Steve Sanbeg wrote:
After the last batch of problems were fixed, enwiki was getting dumped almost weekly. But there was no dump in December, and since it seems to be held up dumping the full history with an ETA of Jan 22nd, it seems like it will be at least a few more weeks.
So I wanted to check on what kind of schedule to expect. Is this a temporary issue, so we can go back to frequent dumps again once this is done? Or were the frequent dumps a temporary thing, and this is the norm?
Would it be difficult to run the current version & full history concurrently on a separate schedule, to allow the current versions to be dumped more frequently?
Within the last year we've tried to schedule monthly releases at the very least. Most consumers are perfectly happy with that data set as the time required to go through the copy and process is long enough to where anything more recent is unnecessary.
Now with several fixes and the re compression of the text external storage we had gotten down to two week cycles for most if not all wikis.
That short release cycle was certainly attractive but not a hard requirement.
Since we've crept just over the month cycle I'm certainly tempted to run another non full history snapshot in order to catch up. Assuming its not too resource intensive.
--tomasz
OK, thanks. It would definitely be useful for me if we can keep two week cycles most of the time, and not slip to more than a month. I was concerned that getting more frequent data was dependent on something else failing for the dump to be created more frequently, which would be a shame. Anyway, I'll keep checking for the next dump.
Thanks -Steve
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I have an enwiki snapshot running on a separate instance. Will post as soon as it finishes.
--tomasz
On Sat, Jan 16, 2010 at 11:56 AM, Tomasz Finc tfinc@wikimedia.org wrote:
I have an enwiki snapshot running on a separate instance. Will post as soon as it finishes.
--tomasz
Just a note, "Due to high database server lag, changes newer than 7,994 seconds may not appear in this list.", and it's apparently getting worse. No idea if its connected to the dump or not.
-Peachey
On 1/15/10 5:56 PM, Tomasz Finc wrote:
Steve Sanbeg wrote:
On Mon, 04 Jan 2010 18:11:33 -0800, Tomasz Finc wrote:
Steve Sanbeg wrote:
After the last batch of problems were fixed, enwiki was getting dumped almost weekly. But there was no dump in December, and since it seems to be held up dumping the full history with an ETA of Jan 22nd, it seems like it will be at least a few more weeks.
So I wanted to check on what kind of schedule to expect. Is this a temporary issue, so we can go back to frequent dumps again once this is done? Or were the frequent dumps a temporary thing, and this is the norm?
Would it be difficult to run the current version & full history concurrently on a separate schedule, to allow the current versions to be dumped more frequently?
Within the last year we've tried to schedule monthly releases at the very least. Most consumers are perfectly happy with that data set as the time required to go through the copy and process is long enough to where anything more recent is unnecessary.
Now with several fixes and the re compression of the text external storage we had gotten down to two week cycles for most if not all wikis.
That short release cycle was certainly attractive but not a hard requirement.
Since we've crept just over the month cycle I'm certainly tempted to run another non full history snapshot in order to catch up. Assuming its not too resource intensive.
--tomasz
OK, thanks. It would definitely be useful for me if we can keep two week cycles most of the time, and not slip to more than a month. I was concerned that getting more frequent data was dependent on something else failing for the dump to be created more frequently, which would be a shame. Anyway, I'll keep checking for the next dump.
Thanks -Steve
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I have an enwiki snapshot running on a separate instance. Will post as soon as it finishes.
--tomasz
New snapshot ready.
http://download.wikipedia.org/enwiki/20100116
--tomasz
On Mon, Jan 25, 2010 at 6:23 PM, Tomasz Finc tfinc@wikimedia.org wrote:
New snapshot ready.
And the history dump, which had run for a month and a half and looked like it was going to actually complete for the first time in years, is now broken. Thanks a lot.
On Fri, Jan 29, 2010 at 2:33 AM, Anthony wikimail@inbox.org wrote:
On Mon, Jan 25, 2010 at 6:23 PM, Tomasz Finc tfinc@wikimedia.org wrote:
New snapshot ready.
And the history dump, which had run for a month and a half and looked like it was going to actually complete for the first time in years, is now broken. Thanks a lot.
How are the old revisions backed up, by the way? Just replication to a remote datacenter?
Marco
2010/1/29 Anthony wikimail@inbox.org:
And the history dump, which had run for a month and a half and looked like it was going to actually complete for the first time in years, is now broken. Thanks a lot.
Obviously this wasn't sabotaged on purpose. Please assume good faith and refrain from sarcastic comments such as the above.
Roan Kattouw (Catrope)
Anthony wrote:
On Mon, Jan 25, 2010 at 6:23 PM, Tomasz Finc tfinc@wikimedia.org wrote:
New snapshot ready.
And the history dump, which had run for a month and a half and looked like it was going to actually complete for the first time in years, is now broken. Thanks a lot. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I posted this earlier on xmldatadumps-admin-l@
http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/2010-January/00007...
For those not on the list we had an unscheduled change within our data center that caused the host running the job to be taken off line. I've pulled out the corrupted pieces of the eu history snapshot and am piecing it back together now. Going through 256GB of compressed data takes a bit of time :D
Will update when i have more info.
--tomasz
On Mon, Jan 4, 2010 at 12:18 PM, Steve Sanbeg ssanbeg@ask.com wrote:
After the last batch of problems were fixed, enwiki was getting dumped almost weekly. But there was no dump in December, and since it seems to be held up dumping the full history with an ETA of Jan 22nd, it seems like it will be at least a few more weeks.
That's 7 days ago. Now the ETA is at Jan 27th. I'd say we're looking at at least 2 months for the bzip2 version of the full history dump to finish. I hope we're willing to let it finish this time, though. My understanding is that once this one finishes the future ones will be faster.
wikitech-l@lists.wikimedia.org