Re: [Wikitech-l] Dump processes seem to be dead

25 Feb 2009


      --- El mié, 25/2/09, Robert Ullmann rlullmann@gmail.com escribió:
...
De: Robert Ullmann rlullmann@gmail.com
Asunto: Re: [Wikitech-l] Dump processes seem to be dead
Para: "Wikimedia developers" wikitech-l@lists.wikimedia.org
Fecha: miércoles, 25 febrero, 2009 2:09
you
yourself suggested page id.
I suggest the history be partitioned into
"blocks" by *revision ID*
I've checked some alternatives to slice the huge dump files in chunks with a more manageable size. I first thought about dividing the blocks by rev_id, like you suggest. Then, I realized that it can pose some problems for parsers recovering information, since revisions corresponding to the same page may fall in different dump files.
Once you have surpassed the page_id tag, you cannot remember it if the process stops due to some error, unless you save breakpoint information to recover it later on, when you restart the process again.
Partitioning by page_id, you can maintain all revs of the same page in the same block, while you don't disturb algorithms looking for individual revisions.
Yes, the chunks would be slightly bigger, but the difference is not that much with either 7zip or bzip2, and you favor simplicity of recovering tools.
Best,
F.
...

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dump processes seem to be dead