Re: [Wikitech-l] Performance problems importing meta-history.xml

5 Jul 2007


      Minute Electron schrieb:
...
On 7/5/07, Christoph Litauer litauer@uni-koblenz.de wrote:
...
Hi,
for a few weeks now I try to setup a local wikipedia-Server. For some
science reasons we need to import the german wikipedia with all history
information. I downloaded the compressed XML file, and used mwimport to
convert the XML data to a sql command file. The resulting file is about
500 MB and contains about 50 million lines of sql commands.
I pumped this file into a mysql command. The mysql server is running on
the same machine, 2GB of memory. I configured mysql to use 1.5 GB as
buffer for innodb (innodb_buffer_pool_size).
The mysql command runs for 2 weeks now and imported about 20 million of
29 million article revisions. This seems to be extremly slow for me. I
think I must be doing something wrong, but I cannot find a mistake.
A simple counting command takes between 3 and 7 hours:
mysql> select count(rev_id) from revision;
+---------------+
| count(rev_id) |
+---------------+
|      20923026 |
+---------------+
1 row in set (7 hours 5 min 53.40 sec)
How can that be? Any ideas how to improve the performance? Thanks a lot
in advance!
--
Regards
Christoph
There is a nice tool called mwdumper which can be found in the SVN
repository and is written in Java so imports pages much faster.
Thanks, but I already figured mwdumper out: "Future versions of mwdumper
will include support for creating a database and configuring a MediaWiki
installation directly, but currently it just produces raw SQL which can
be piped to MySQL."
I already produced raw SQL (using mwimport), so it's not the XML to SQL
conversion that is the bottleneck. I think mwdumper just improves this
step but not the data import to the database.
-- 
Regards
Christoph
________________________________________________________________________
Christoph Litauer                  litauer@uni-koblenz.de
Uni Koblenz, Computing Center,     http://www.uni-koblenz.de/~litauer
Postfach 201602, 56016 Koblenz     Fon: +49 261 287-1311, Fax: -100 1311
PGP-Fingerprint: F39C E314 2650 650D 8092 9514 3A56 FBD8 79E3 27B2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Performance problems importing meta-history.xml