Minute Electron schrieb:
On 7/5/07, Christoph Litauer litauer@uni-koblenz.de wrote:
Hi,
for a few weeks now I try to setup a local wikipedia-Server. For some science reasons we need to import the german wikipedia with all history information. I downloaded the compressed XML file, and used mwimport to convert the XML data to a sql command file. The resulting file is about 500 MB and contains about 50 million lines of sql commands.
I pumped this file into a mysql command. The mysql server is running on the same machine, 2GB of memory. I configured mysql to use 1.5 GB as buffer for innodb (innodb_buffer_pool_size).
The mysql command runs for 2 weeks now and imported about 20 million of 29 million article revisions. This seems to be extremly slow for me. I think I must be doing something wrong, but I cannot find a mistake.
A simple counting command takes between 3 and 7 hours:
mysql> select count(rev_id) from revision; +---------------+ | count(rev_id) | +---------------+ | 20923026 | +---------------+ 1 row in set (7 hours 5 min 53.40 sec)
How can that be? Any ideas how to improve the performance? Thanks a lot in advance!
-- Regards Christoph
There is a nice tool called mwdumper which can be found in the SVN repository and is written in Java so imports pages much faster.
Thanks, but I already figured mwdumper out: "Future versions of mwdumper will include support for creating a database and configuring a MediaWiki installation directly, but currently it just produces raw SQL which can be piped to MySQL." I already produced raw SQL (using mwimport), so it's not the XML to SQL conversion that is the bottleneck. I think mwdumper just improves this step but not the data import to the database.