Hi all,
i am trying to import enwiki using mwdumper on a W2K machine using 512 physical RAM plus 700 MB virtual RAM.
I have installed Java 1.5 running latest version of mwdumper.
Program was started issuing this command: java -jar mwdumper.jar --format=sql:1.5 enwiki.xml | mysql -f -u<admin> -p<password> -Denwiki
First 1 Mio pages inserted rather fast, although speed came down almost linearily.
It started to insert at a speed of approx. 440 pages/sec.
Now, after > 2.400.000 pages inserted speed came down to only 27 pages/sec.
The problem seems to be extensive memory usage, or faulty/unnecessary caching.
MS-Win Taskmanager shows an always increasing memory usage of now
142.000k for mysqld-nt.exe. java.exe is remaining constantly at 61.880k.
I am using Mysql 4.1.4 and MyISAM tables. (I had to use MySQL 4.1.4 for other System reasons. Also, I had to use MyISAM as Mediawiki 1.9.3 was not creating Innodb tables correctly on Mysql 4.1.4)
May be, someone in this list could advise me how to optimize mysql for the given task.
Any help is highly appreciated and welcome !
Thanks
Alex
Here is the output of mysqladmin / variables command:
+---------------------------------+------------------------------------+ | Variable_name | Value | +---------------------------------+------------------------------------+ | back_log | 50 | | basedir | C:\MySQL\414\ | | binlog_cache_size | 32768 | | bulk_insert_buffer_size | 8388608 | | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | C:\MySQL\414\share\charsets/ | | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | | concurrent_insert | ON | | connect_timeout | 5 | | datadir | C:\MySQL\414\Data\ | | date_format | %Y-%m-%d | | datetime_format | %Y-%m-%d %H:%i:%s | | default_week_format | 0 | | delay_key_write | ON | | delayed_insert_limit | 100 | | delayed_insert_timeout | 300 | | delayed_queue_size | 1000 | | expire_logs_days | 0 | | flush | OFF | | flush_time | 1800 | | ft_boolean_syntax | + -><()~*:""&| | | ft_max_word_len | 84 | | ft_min_word_len | 4 | | ft_query_expansion_limit | 20 | | ft_stopword_file | (built-in) | | group_concat_max_len | 1024 | | have_archive | NO | | have_bdb | NO | | have_blackhole_engine | NO | | have_compress | YES | | have_crypt | NO | | have_csv | NO | | have_example_engine | NO | | have_geometry | YES | | have_innodb | DISABLED | | have_isam | NO | | have_ndbcluster | NO | | have_openssl | NO | | have_query_cache | YES | | have_raid | NO | | have_rtree_keys | YES | | have_symlink | YES | | init_connect | | | init_file | | | init_slave | | | innodb_additional_mem_pool_size | 2097152 | | innodb_autoextend_increment | 8 | | innodb_buffer_pool_awe_mem_mb | 0 | | innodb_buffer_pool_size | 8388608 | | innodb_data_file_path | | | innodb_data_home_dir | | | innodb_fast_shutdown | ON | | innodb_file_io_threads | 4 | | innodb_file_per_table | OFF | | innodb_flush_log_at_trx_commit | 1 | | innodb_flush_method | | | innodb_force_recovery | 0 | | innodb_lock_wait_timeout | 50 | | innodb_locks_unsafe_for_binlog | OFF | | innodb_log_arch_dir | | | innodb_log_archive | OFF | | innodb_log_buffer_size | 1048576 | | innodb_log_file_size | 39845888 | | innodb_log_files_in_group | 2 | | innodb_log_group_home_dir | | | innodb_max_dirty_pages_pct | 90 | | innodb_max_purge_lag | 0 | | innodb_mirrored_log_groups | 1 | | innodb_open_files | 300 | | innodb_table_locks | ON | | innodb_thread_concurrency | 8 | | interactive_timeout | 28800 | | join_buffer_size | 131072 | | key_buffer_size | 134217728 | | key_cache_age_threshold | 300 | | key_cache_block_size | 1024 | | key_cache_division_limit | 100 | | language | C:\MySQL\414\share\english\ | | large_files_support | ON | | license | GPL | | local_infile | ON | | log | OFF | | log_bin | OFF | | log_error | .\powerstation.err | | log_slave_updates | OFF | | log_slow_queries | OFF | | log_update | OFF | | log_warnings | 1 | | long_query_time | 10 | | low_priority_updates | OFF | | lower_case_file_system | OFF | | lower_case_table_names | 1 | | max_allowed_packet | 33553408 | | max_binlog_cache_size | 4294967295 | | max_binlog_size | 1073741824 | | max_connect_errors | 10 | | max_connections | 20 | | max_delayed_threads | 20 | | max_error_count | 64 | | max_heap_table_size | 16777216 | | max_insert_delayed_threads | 20 | | max_join_size | 4294967295 | | max_length_for_sort_data | 1024 | | max_relay_log_size | 0 | | max_seeks_for_key | 4294967295 | | max_sort_length | 1024 | | max_tmp_tables | 32 | | max_user_connections | 0 | | max_write_lock_count | 4294967295 | | myisam_data_pointer_size | 4 | | myisam_max_extra_sort_file_size | 107374182400 | | myisam_max_sort_file_size | 107374182400 | | myisam_recover_options | OFF | | myisam_repair_threads | 1 | | myisam_sort_buffer_size | 53477376 | | named_pipe | OFF | | net_buffer_length | 16384 | | net_read_timeout | 30 | | net_retry_count | 10 | | net_write_timeout | 60 | | new | OFF | | old_passwords | OFF | | open_files_limit | 542 | | pid_file | C:\MySQL\414\Data\powerstation.pid | | port | 3306 | | preload_buffer_size | 32768 | | protocol_version | 10 | | query_alloc_block_size | 8192 | | query_cache_limit | 1048576 | | query_cache_min_res_unit | 4096 | | query_cache_size | 262144 | | query_cache_type | ON | | query_cache_wlock_invalidate | OFF | | query_prealloc_size | 8192 | | range_alloc_block_size | 2048 | | read_buffer_size | 61440 | | read_only | OFF | | read_rnd_buffer_size | 258048 | | relay_log_purge | ON | | relay_log_space_limit | 0 | | rpl_recovery_rank | 0 | | secure_auth | OFF | | shared_memory | OFF | | shared_memory_base_name | MYSQL | | server_id | 0 | | skip_external_locking | ON | | skip_networking | OFF | | skip_show_database | OFF | | slave_net_timeout | 3600 | | slave_transaction_retries | 0 | | slow_launch_time | 2 | | sort_buffer_size | 262136 | | sql_mode | | | storage_engine | MyISAM | | sql_notes | ON | | sql_warnings | ON | | sync_binlog | 0 | | sync_replication | 0 | | sync_replication_slave_id | 0 | | sync_replication_timeout | 0 | | sync_frm | ON | | system_time_zone | Westeurop?ische Sommerzeit | | table_cache | 256 | | table_type | MyISAM | | thread_cache_size | 8 | | thread_stack | 196608 | | time_format | %H:%i:%s | | time_zone | SYSTEM | | tmp_table_size | 27262976 | | tmpdir | | | transaction_alloc_block_size | 8192 | | transaction_prealloc_size | 4096 | | tx_isolation | REPEATABLE-READ | | version | 4.1.14-nt | | version_comment | Official MySQL binary | | version_compile_machine | ia32 | | version_compile_os | Win32 | | wait_timeout | 28800 | +---------------------------------+------------------------------------+
On Wed, 2007-08-01 at 20:38 +0200, MESHine Team wrote:
May be, someone in this list could advise me how to optimize mysql for the given task.
The problem is that you're using InnoDB and reindexing at every insert.
Disable the keys on page/text/revision before you import, and enable them when you're done, and you should see a significant speedup.
My en.wiki import takes about 45 minutes total, for 5.3 million rows.
wikitech-l@lists.wikimedia.org