Dear all,
I recently tried to import one of the latest SQL dumps of English
Wikipedia titles in MySQL:
* 2016-04-11 20:07:38 done Base per-page data (id, title, old
restrictions, etc).
o enwiki-20160407-page.sql.gz
<https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-page.sql.gz>
1.3 GB
I think there is a performance issue that can be largely improved.
Despite I disabled the check of foreign constraints and I enabled a
large packet size and did some other optimizations, I took around 2 days
to do it in a laptop with 8GB of RAM.
I think I can't disable the autocommit due to the lack of RAM, but even
without auto_commit I tried to modify some columns and it took ages.
I did a test. I edited the SQL dump file and remove the creation of
indexes when the table is created. This was not very easy because very
few test editors can open such a large file.
I ran the import again, without inidices and key constraints, etc. and
it took less than two hours.
Creating the indices / key constraints after importing the tables took
around 10-15 minutes per index.
I think with the original dump there is a problem: all the indices are
being rebuilt after every insertion, which seems to be very inefficient.
So I would like to ask if there is any way that the dump files are
created in a more optimized way, skipping the index creation at first
and leaving it for the end of the script.
Best regards,
Rafael