You can "SET unique_checks=0;" which speeds up index handling; if that
doesn't help enough, then you may indeed want to look at creating indices
afterwards. Be aware that this can be very costly in terms of time at the
end, however.
You could filter out the indices on the fly (tossing all the statements
with KEY in them in the CREATE stanza). Alternatively you could pull the
drop/create stanzas into a separate fie, run those, drop the indices
yourself, and then gzcat | tail the pages dump from the line after the
CREATE stanza.
If you would like, you could add a narrative about your experience
importing these files, to our docs about importing dumps, here:
https://meta.wikimedia.org/wiki/Data_dumps/Import_examples
Ariel Glenn
On Mon, Apr 18, 2016 at 10:27 AM, Phyambre <phy.reg(a)gmail.com> wrote:
Dear all,
I recently tried to import one of the latest SQL dumps of English
Wikipedia titles in MySQL:
- 2016-04-11 20:07:38 done Base per-page data (id, title, old
restrictions, etc).
- enwiki-20160407-page.sql.gz
<https://dumps.wikimedia.org/enwiki/20160407/enwiki-20160407-page.sql.gz>
1.3 GB
I think there is a performance issue that can be largely improved.
Despite I disabled the check of foreign constraints and I enabled a large
packet size and did some other optimizations, I took around 2 days to do it
in a laptop with 8GB of RAM.
I think I can't disable the autocommit due to the lack of RAM, but even
without auto_commit I tried to modify some columns and it took ages.
I did a test. I edited the SQL dump file and remove the creation of
indexes when the table is created. This was not very easy because very few
test editors can open such a large file.
I ran the import again, without inidices and key constraints, etc. and it
took less than two hours.
Creating the indices / key constraints after importing the tables took
around 10-15 minutes per index.
I think with the original dump there is a problem: all the indices are
being rebuilt after every insertion, which seems to be very inefficient.
So I would like to ask if there is any way that the dump files are created
in a more optimized way, skipping the index creation at first and leaving
it for the end of the script.
Best regards,
Rafael
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l