Sorry, I forgot to mention that I have in mind the
English wikipedia dump.
wiki writes:
> Hello.
>
> I'm a newbie who wants to start playing with the xml dumps. I've found
> instructions here and there on how to import these. I'd like to seek
> guidance though as to how much free disk space one is required to have for
> the MySql import to succeed? i.e. after I have already installed LAMP +
> Mediawiki, and already allocated space for the bzip file and the converted
> import statements file, roughly how much more space is needed?
Hi!
First (because someone else will probably tell you), you shouldn't
cross-post to multiple lists -- at least without announcing it. (I saw
this post on wikitech-l; Xmldatadumps-l)
As to disk space, the text size of the English Wikipedia dump is
roughly 25 GB. I imagine this will be < 32 GB in a MySQL database (I'm
guesstimating 75% fill factor)
However, I think that importing an xml dump is going to be quite
challenging -- especially for English Wikipedia. I've not done one,
but everything I've read indicates the process will probably take well
over 30 hours to complete. You can read more about it here:
https://meta.wikimedia.org/wiki/Data_dumps/ImportDump.php and here:
https://www.mediawiki.org/wiki/Manual:MWDumper. (I would also look at
https://www.mediawiki.org/wiki/Manual_talk:MWDumper to get an idea of
other people's experiences).
There is probably going to be a lot of work involved. The official
importer (ImportDump.php) is said to be slow and the other candidate
(mwdumper) does not seem to be supported. You will also have to import
other tables as well (for example, categories). Images is an entirely
other issue.
If you want a more automated process, you can look at wp-mirror:
http://www.nongnu.org/wp-mirror/ . It is under-development, but it
aims to produce "one-step" full mirror sites for any wiki (with
images). However, English Wikipedia will take 2 months to set up (5
million seconds)
If you just want a copy of English Wikipedia offline (and not a
MediaWiki installation), then you are probably better off with an
offline app. If so, you should try one of the following:
* Kiwix (
http://www.kiwix.org) is the official offline app for
Wikipedia. It is complete, stable, well-featured, and fully functional
for any of the major Wikipedias. However, it uses a ZIM format (no
dumps) and has a copy of English Wikipedia available from last year.
* WikiTaxi (
http://www.wikitaxi.org) works with any of the XML dumps.
It only works on a Windows machine (on Linux you can try WINE).
* XOWA (
http://sourceforge.net/projects/xowa/) works with any of the
xml dumps. It handles images and allows editing. However, it is
relatively new and in an alpha state. Also, note that I am the XOWA
dev.
Hope this is useful.