[Wikitech-l] Performance problems importing meta-history.xml

5 Jul 2007


      Hi,
for a few weeks now I try to setup a local wikipedia-Server. For some
science reasons we need to import the german wikipedia with all history
information. I downloaded the compressed XML file, and used mwimport to
convert the XML data to a sql command file. The resulting file is about
500 MB and contains about 50 million lines of sql commands.
I pumped this file into a mysql command. The mysql server is running on
the same machine, 2GB of memory. I configured mysql to use 1.5 GB as
buffer for innodb (innodb_buffer_pool_size).
The mysql command runs for 2 weeks now and imported about 20 million of
29 million article revisions. This seems to be extremly slow for me. I
think I must be doing something wrong, but I cannot find a mistake.
A simple counting command takes between 3 and 7 hours:
mysql> select count(rev_id) from revision;
  +---------------+
  | count(rev_id) |
  +---------------+
  |      20923026 |
  +---------------+
  1 row in set (7 hours 5 min 53.40 sec)
How can that be? Any ideas how to improve the performance? Thanks a lot
in advance!
-- 
Regards
Christoph
________________________________________________________________________
Christoph Litauer                  litauer@uni-koblenz.de
Uni Koblenz, Computing Center,     http://www.uni-koblenz.de/~litauer
Postfach 201602, 56016 Koblenz     Fon: +49 261 287-1311, Fax: -100 1311
PGP-Fingerprint: F39C E314 2650 650D 8092 9514 3A56 FBD8 79E3 27B2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Performance problems importing meta-history.xml