On 7/5/07, Christoph Litauer litauer@uni-koblenz.de wrote:
Hi,
for a few weeks now I try to setup a local wikipedia-Server. For some science reasons we need to import the german wikipedia with all history information. I downloaded the compressed XML file, and used mwimport to convert the XML data to a sql command file. The resulting file is about 500 MB and contains about 50 million lines of sql commands.
I pumped this file into a mysql command. The mysql server is running on the same machine, 2GB of memory. I configured mysql to use 1.5 GB as buffer for innodb (innodb_buffer_pool_size).
The mysql command runs for 2 weeks now and imported about 20 million of 29 million article revisions. This seems to be extremly slow for me. I think I must be doing something wrong, but I cannot find a mistake.
A simple counting command takes between 3 and 7 hours:
mysql> select count(rev_id) from revision; +---------------+ | count(rev_id) | +---------------+ | 20923026 | +---------------+ 1 row in set (7 hours 5 min 53.40 sec)
How can that be? Any ideas how to improve the performance? Thanks a lot in advance!
-- Regards Christoph ________________________________________________________________________ Christoph Litauer litauer@uni-koblenz.de Uni Koblenz, Computing Center, http://www.uni-koblenz.de/~litauer Postfach 201602, 56016 Koblenz Fon: +49 261 287-1311, Fax: -100 1311 PGP-Fingerprint: F39C E314 2650 650D 8092 9514 3A56 FBD8 79E3 27B2
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
There is a nice tool called mwdumper which can be found in the SVN repository and is written in Java so imports pages much faster. MinuteElectron.