Hi all,
I experience extremely slow import of pagelinks.sql (MYISAM). It's only about 13 000 000 rows of data so it shouldn't take many minutes. Now it takes hours.
I noticed that time for inserts increase as the table grows. Peeking with SHOW PROGRESSLIST sometimes says:
+------+------------+-------- | time | State | Info | 38 | ** DEAD ** | INSERT INTO 'pagelinks' ... +------+------------+--------------------------------
I thought it'd be enough to shut off keys during import etc like so:
SET AUTOCOMMIT=0; SET UNIQUE_CHECKS=0; SET FOREIGN_KEY_CHECKS=0; ALTER TABLE `pagelinks` DISABLE KEYS;
...and after import activate them again like so:
SET UNIQUE_CHECKS=1; SET FOREIGN_KEY_CHECKS=1; ALTER TABLE `pagelinks` ENABLE KEYS; COMMIT;
Since it took so long I aborted the import of tables (tried first to import in alphabethical order) and imported all the other tables first, and after that I import pagelinks.
I have a dual core with 2GB ram, xampplite on windows.
What am I doing wrong? Does import order really matter that much? (I'da already tried all tricks if it only didn't take soooo loooooong to try all 'em tricks when its slow like this).
Any ideas about how to reduce time /significantly/ from hours down to minutes? I use this data for testing ("consuming" it by processing it) and thus I need to often "reinstall" the data, in several databases.
// Rolf Lampa
Rolf Lampa [RIL] skrev:
Hi all,
only about 13 000 000 rows of data so it shouldn't take many minutes. Now it takes hours. I noticed that time for inserts increase as the table grows. Peeking with SHOW PROGRESSLIST sometimes says:
+------+------------+-------- | time | State | Info | 38 | ** DEAD ** | INSERT INTO 'pagelinks' ... +------+------------+--------------------------------
When I had posted I realized that at only 230 000 rows it now takes up to 76 seconds(unit?) per row, and increasing...
No no, here's something seriously wrong.
// Rolf Lampa
On Mon, Jun 22, 2009 at 3:27 AM, Rolf Lampa [RIL] rolf.lampa@rilnet.comwrote:
Hi all,
I experience extremely slow import of pagelinks.sql (MYISAM). It's only about 13 000 000 rows of data so it shouldn't take many minutes. Now it takes hours.
I noticed that time for inserts increase as the table grows. Peeking with SHOW PROGRESSLIST sometimes says:
+------+------------+-------- | time | State | Info | 38 | ** DEAD ** | INSERT INTO 'pagelinks' ... +------+------------+--------------------------------
I thought it'd be enough to shut off keys during import etc like so:
SET AUTOCOMMIT=0; SET UNIQUE_CHECKS=0; SET FOREIGN_KEY_CHECKS=0; ALTER TABLE `pagelinks` DISABLE KEYS;
...and after import activate them again like so:
SET UNIQUE_CHECKS=1; SET FOREIGN_KEY_CHECKS=1; ALTER TABLE `pagelinks` ENABLE KEYS; COMMIT;
Since it took so long I aborted the import of tables (tried first to import in alphabethical order) and imported all the other tables first, and after that I import pagelinks.
I have a dual core with 2GB ram, xampplite on windows.
What am I doing wrong? Does import order really matter that much? (I'da already tried all tricks if it only didn't take soooo loooooong to try all 'em tricks when its slow like this).
Any ideas about how to reduce time /significantly/ from hours down to minutes? I use this data for testing ("consuming" it by processing it) and thus I need to often "reinstall" the data, in several databases.
// Rolf Lampa
Kind of sidestepping your question, but I think you can just rebuild the pagelinks table with refreshLinks.php. I can't say for sure whether that would actually be faster, though.
Benjamin Lees wrote:
Kind of sidestepping your question, but I think you can just rebuild the pagelinks table with refreshLinks.php. I can't say for sure whether that would actually be faster, though.
That would require *parsing* all the pages. It'd be much slower.
Platonides skrev:
Benjamin Lees wrote:
Kind of sidestepping your question, but I think you can just rebuild the pagelinks table with refreshLinks.php. I can't say for sure whether that would actually be faster, though.
That would require *parsing* all the pages. It'd be much slower.
Yes, it'd take weeks actually. :)
But now I have found a solution. On this page a guy suggest some temporal settings suitable for import, and then things started to happen (less than an hour): http://www.pilhokim.com/index.php?title=EChronicle/Wikipedia_raw_text_conver...
// Rolf Lampa
mediawiki-l@lists.wikimedia.org