[Mediawiki-l] Import problems (max table size?) + Recommended DB tools

Rolf Lampa rolf.lampa at rilnet.com
Sat Mar 17 23:22:32 UTC 2007


Hi all,

Import problem:

I tried to import (in several ways) the latest English dump, converted 
to proper SQL, but seemingly there's limit/setting somewhere halting the 
import when the text table reaches 4GB size (1995267 rows) in MyISAM 
(page and revision imported without problem though). What have I missed? 
A setting somewhere in mysql, table kind, no?
--

Recommended DB tools?:

I also wonder if anyone has seen any very very well working db tools 
which shuffles data between databases efficiently. I need the following 
(unfortunately MySQL master/slave replication won't do (all) the tricks 
for me):

1. Both Local and Remote replication (via TCP/IP and/or Http/PHP 
"tunneling" for more restrictive hosts).
2. "Restore" database from sql-dump AND directly from another database, 
see #1 (i.e. Data Transfer and Data Synchronization, one way suffice).
3. Data Compression on remote connections.

In short, the tools should be able to do "all the tricks" regarding 
Backup/Restore & Transfer and Synchronization - with good speed over 
regular ~20Mb(/1) broadband internet connections. Perhaps one need to 
pay for such a tool but I would'nt pay much over $100.

What I have tried:
1. MySQL Adminstrator. I've played around with MySQL Adminstrator 
(Backup/Restore) but it takes tiiiiime to shuffle big tables. It took 
like 5 hours to upload the enwiki_page table. Unless I have missed some 
settings that's way too slow.

2. DumpTimer.
http://www.dumptimer.com/index.php
A lightweight and smart tool, opens stable connections to most any host 
out there (gzip-->unpacked by a serverside php-script), and uploads with 
fairly good pace, but only until ~200 Gb something is reached when it 
starts to slow down significantly. It's almost stalled at some 300 Gb. 
Dead end for big tables.

3. Navicat.
http://www.navicat.com/
Looks really really (yes, really) good at first, it has all kinds of 
combinations possible (Backup/Restore, Transfer, Synchro, etc) but your 
joy is turned into desperation when it occurs to you that the thing 
first "Processes" the data, say 4,5 million rows... all the while memory 
peak is, well,  increasing and increasing, and increasing... and when 
when passing 2 GB it decides to start shuffle the data also, but when 
the counters starts to spin its time for "out of memory". Sigh. (seems 
to be a good choice though if you don't have huuuge tables).

So, anyone, what's is the "ultimate tool" out there? Some tool which 
deals with both size and speed. I mean, can't be the only one having the 
need for a tool which does "all the stuff" you need to do with a bunch 
of remote databases...

Oh, and all features must be possible to schedule too, of course.

TIA,

// Rolf Lampa



More information about the MediaWiki-l mailing list