1 - There seems to be a problem with the .bz2 dump
file wikipedia
provides. I tried to download it twice, and it aborted on an error after
running for about 7 hours, at about 2GB.
The dump I pulled of ewiki from 2007-09-10 seems to work great here.
47483ed164a363f97595714afb929b0f enwiki-latest-pages-articles.xml.bz2
$ du -sch enwiki-latest-pages-articles.xml.bz2
2.9G enwiki-latest-pages-articles.xml.bz2
2.9G total
Does anyone know how I can know when the next dump
will be available?
Just use the latest one available. Do your needs really _require_ that
your copy stay in lockstep with the current Wikipedia public data?
Does wikipedia provide a new dump every so-and-so
days?
By the looks of d.w.o, it's roughly every 2 weeks.
2 - Is there anyone working with MVDumper.jar?
I've been using mwdumper.jar exclusively for unpacking, prior to
importing into my db.. works great. My version is slightly patched for
adding table prefixes and some other mysql tweaks to speed up imports.
3 - Does the bump include ambiguous lists, categories
and different
types of lists? I have been trying to download it for two days, so I am
curious to know what I'll find there.
Which dump are you trying to download? What filesystem are you saving
the dump to? What tool are you using to retrieve that dump?