Dear all,
I recently tried to import one of the latest SQL dumps of English Wikipedia titles in MySQL:
* 2016-04-11 20:07:38 done Base per-page data (id, title, old restrictions, etc). o enwiki-20160407-page.sql.gz https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-page.sql.gz 1.3 GB
I think there is a performance issue that can be largely improved. Despite I disabled the check of foreign constraints and I enabled a large packet size and did some other optimizations, I took around 2 days to do it in a laptop with 8GB of RAM. I think I can't disable the autocommit due to the lack of RAM, but even without auto_commit I tried to modify some columns and it took ages.
I did a test. I edited the SQL dump file and remove the creation of indexes when the table is created. This was not very easy because very few test editors can open such a large file. I ran the import again, without inidices and key constraints, etc. and it took less than two hours. Creating the indices / key constraints after importing the tables took around 10-15 minutes per index. I think with the original dump there is a problem: all the indices are being rebuilt after every insertion, which seems to be very inefficient. So I would like to ask if there is any way that the dump files are created in a more optimized way, skipping the index creation at first and leaving it for the end of the script.
Best regards,
Rafael