new dumps ? - Wikitech-l

27 Sep 2006

Steve Summit

...
  And in case no one's made the observation: after
just a couple
 of initial hiccups (affecting, not surprisingly, only the
 biggest wikis), it seems to be working very well, with all
 dumps successfully up-to-date, on a cycle of just a few days.
 <http://download.wikimedia.org/> is a very pretty picture now.
 Well done! 
Sorry Steve you are quite mistaken.
I'm probably the one person looking at the xml download progress report most
often, as I wait impatiently for a good moment to run wikistats after month
has completed

Dumps for largest Wikipedias still fail very frequently.
There has not been a useful English dumps for months and maybe a handful in
a whole year. 
Sometimes the jump job reports all is well when it is not (Brion knows this)

I hate to chase Brion because he has thousand obligations but dump process
is pretty unstable still

http://download.wikimedia.org/enwiki/20060925/ is running now, but 
http://download.wikimedia.org/enwiki/20060920/ reports it is still in
progress
http://download.wikimedia.org/enwiki/20060911/ reports on the 7z file all is
OK but is 36 Mb
http://download.wikimedia.org/enwiki/20060906/ reports on the 7z file all is
OK but is 19 Mb
http://download.wikimedia.org/enwiki/20060905/ reports on the 7z file all is
OK but is 98 bytes
http://download.wikimedia.org/enwiki/20060816/ reports on the 7z file all is
OK and it is 5.1 Gb but I know it is incomplete it just stops in the middle
of an article
http://download.wikimedia.org/enwiki/20060810/ failed
http://download.wikimedia.org/enwiki/20060803/ in progress
http://download.wikimedia.org/enwiki/20060717/ OK
http://download.wikimedia.org/enwiki/20060702/ OK
http://download.wikimedia.org/enwiki/20060619/ in progress
I could go on: 2 or 3 OK in 10 older runs

Early this year there was no valid en: archive dump for over 4 months.

I proposed doing the largest dumps in incremental steps (say one job per
letter of the alphabet and concat at the end), so that rerun after error
would be less costly
but Brion says there are no disk resources for that

As other people commented, the current situation helps to prevent forks ;)

So again I fully appreciate Brion can't be all things to al people.
But please don't suggest the dump process is reliable enough. 

Erik Zachte