Re: [Wikitech-l] Performance problems importing meta-history.xml

5 Jul 2007

      I completely agree with Tim's answer. Proper hard disk resources (in terms of speed) are critical.
MySQL INSERTS don't overload the whole process very much. I usually do the final inserts to MySQL separately and it doesn't take more than the 5-10% of the whole processing time in any case.
I believe that Perl parser is somewhat faster than mwdumper (and definitely, faster than the research version of my WikiXRay new parser; I still have to check them against the next standard version of my parser).
However, you should take into account that, depending on the language you're processing, certain revisions could have a *very big size*, and inevitably any parser, no matter how improved or multithreaded it is, will spend a considerable time to process them (I mean, when we look at the aggregate number of tasks the parser must face off).
It usually takes a week or so for me to load back to MySQL the whole dump of the English version, and that's with a big server with 2 Opteron 2GHz (double-core each one), a lot of fast memory and a RAID 6 array of 8 fast SATA-II disks.
MySQL configuration will be critical later, when you try to "play with your data". I recommend you www.mysqlperformanceblog.com for that. You'll find very useful hints there.
Good luck.
Felipe.
Christoph Litauer litauer@uni-koblenz.de escribió: Tim Starling schrieb:
...
Brion Vibber wrote:
...
Christoph Litauer wrote:
...
Thanks, but I already figured mwdumper out: "Future versions of mwdumper
will include support for creating a database and configuring a MediaWiki
installation directly, but currently it just produces raw SQL which can
be piped to MySQL."
Yes, you have to run tables.sql into your database as well. Painful, I
know. ;)
...
I already produced raw SQL (using mwimport), so it's not the XML to SQL
conversion that is the bottleneck. I think mwdumper just improves this
step but not the data import to the database.
I don't know anything about this "mwimport" tool, but mwdumper uses
batch inserts and the README includes a number of tips about speeding up
the SQL end. You might want to check if you're doing this already.
He probably means this:
http://meta.wikimedia.org/wiki/Data_dumps/mwimport
It claims to be faster than mwdumper due to lower CPU usage during XML
parsing. I suspect you could get the same speedup by putting "bfr" in the
pipeline, since I very much doubt you'd max out the CPU while piping into
MySQL, if the whole thing was properly multithreaded.
The problem in Christoph Litauer's case is most probably insufficent
memory and disk resources, possibly coupled with a poorly tuned MySQL
server. Fixing that is probably a better topic for a MySQL support query
than a wikitech-l mailing list thread.
I totally agree! I hoped to get statements like "same for me" or "things
run about 20 times faster here" -- but I didn't ask for it, that's
right. I couldn't find any hints how fast the imports "normally" run,
and as a result if it's worth to spend time optimizing my mysql server.
Seems as if it is worth, so I will take a look at that point. Thank you
all for the answers and hints.
-- 
Regards
Christoph

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l

---------------------------------

LLama Gratis a cualquier PC del Mundo.
Llamadas a fijos y móviles desde 1 céntimo por minuto.
http://es.voice.yahoo.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Performance problems importing meta-history.xml