You can "SET unique_checks=0;" which speeds up index handling; if that doesn't help enough, then you may indeed want to look at creating indices afterwards. Be aware that this can be very costly in terms of time at the end, however.
You could filter out the indices on the fly (tossing all the statements with KEY in them in the CREATE stanza). Alternatively you could pull the drop/create stanzas into a separate fie, run those, drop the indices yourself, and then gzcat | tail the pages dump from the line after the CREATE stanza.
If you would like, you could add a narrative about your experience importing these files, to our docs about importing dumps, here: https://meta.wikimedia.org/wiki/Data_dumps/Import_examples
Ariel Glenn
On Mon, Apr 18, 2016 at 10:27 AM, Phyambre phy.reg@gmail.com wrote:
Dear all,
I recently tried to import one of the latest SQL dumps of English Wikipedia titles in MySQL:
- 2016-04-11 20:07:38 done Base per-page data (id, title, old
restrictions, etc). - enwiki-20160407-page.sql.gz https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-page.sql.gz 1.3 GB
I think there is a performance issue that can be largely improved. Despite I disabled the check of foreign constraints and I enabled a large packet size and did some other optimizations, I took around 2 days to do it in a laptop with 8GB of RAM. I think I can't disable the autocommit due to the lack of RAM, but even without auto_commit I tried to modify some columns and it took ages.
I did a test. I edited the SQL dump file and remove the creation of indexes when the table is created. This was not very easy because very few test editors can open such a large file. I ran the import again, without inidices and key constraints, etc. and it took less than two hours. Creating the indices / key constraints after importing the tables took around 10-15 minutes per index. I think with the original dump there is a problem: all the indices are being rebuilt after every insertion, which seems to be very inefficient. So I would like to ask if there is any way that the dump files are created in a more optimized way, skipping the index creation at first and leaving it for the end of the script.
Best regards,
Rafael
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l