Re: [Wikitech-l] Dumps are currently halted...

11 Oct 2008


      Hey !
May I mention that the scripts generating the dumps and handling the
scheduling are written in Python and are available on wikimedia svn ?
[1]
If you have some improvements to suggest on the task scheduling, I
guess that patches are welcome :)
In may, following another wikitech-l discussion [2] some small
improvements were done on the dump processing, to prioritize the dumps
that haven't been successfully dumped in a long time. Previously, we
were not taking into account the fact that some dump attempts failed,
only ordering the dumps by "last dump try start time", leading to some
inconsistencies.
If I'm right, I think that you should also consider the fact that the
Xml dumping process is also basing itself on the previous dumps to be
faster: in other words, if you have a recent Xml dump, it is faster to
work with that existing dump because you can fetch text records from
the old dump instead of fetching them from the external storage which
also requires normalizing and decompressing.
Here, the latest dump available for enwiki is from July, meaning a lot
of new text to fetch from external storage: this first dump *will*
take a long time, but you should expect the next dumps to go faster.
[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/backup/
[2] http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/38401/...
2008/10/11 Anthony wikimail@inbox.org:
...
On Fri, Oct 10, 2008 at 7:49 PM, Thomas Dalton thomas.dalton@gmail.comwrote:
...
I guess the answer, really, is to get more servers doing dumps - I'm
sure that will come in time.
No, the answer, really, is to do the dumps more efficiently.  Brion says
this should come in the next couple months.
Anthony
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- 
Nicolas Dumazet — NicDumZ [ nɪk.d̪ymz ]
pywikipedia & mediawiki

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dumps are currently halted...