Hi,
This may be a bit obvious – but I don’t have quite as much experience
in this area. The SQL Dumps provided at
http://download.wikimedia.org do
not contain specifications for the “DEFAULT CHARSET” of the respective
Table. When installing MediaWiki – it seems to be recommended to use the
binary Charset. I would like to know how to import one of these dumps
into a Table with the binary Charset.
Right now I import on the cmdline: E.g.
mysql wikidb < enwiki-20090306-pagelinks.sql
This results in the corresponding Table being dropped and then recreated
again. The problem with this is that the newly created Table does not
have the “DEFAULT CHARSET” set to Binary, because the SQL Dumps do not
have these specified.
I first attempted to modify my my.cnf file to set the “DEFAULT CHARSET”
to binary for new Tables. I attempted to make the following changes to
my.cnf:
[client]
default-character-set=binary
[mysqld]
default-character-set=binary
default-collation=binary
character-set-server=binary
collation-server=binary
init-connect='SET NAMES binary'
I restarted the Server – but I found that the new Table that gets
created, gets created in UTF-8, not binary.
I then attempted to edit the SQL File i.e. replace the line
) TYPE=InnoDB;
With
) TYPE=InnoDB DEFAULT CHARSET=binary;
This works, in the sense that now the new Table gets created in Binary.
However I think I am making mistakes in editing the file. These files
are rather large, so I wrote code in Perl, and again in Java to do the
editing. They can manage to do the above substitution, but I am not
entirely confident about their UTF-8 handling. The problem appears when
I am trying to import these modified files, where I get an error
“Duplicate entry” e.g. for the enwiki-20090306-pagelinks.sql file, I get
the error:
ERROR 1062 (23000) at line 1359: Duplicate entry
'1198132-2-Gangleri/tests/links/�' for key 1
I would like to add that importing this file as UTF-8 results in this
“Duplicate entry” error coming much earlier in the input file.
So, what’s the correct way of importing these SQL Dumps, such that they
are imported into a Table in Binary? If my above description is not
clear please let me know and I would try to explain again.
Thanks a lot,
O. O.
P. S. I am running MediaWiki/MySQL under Ubuntu. I hope UTF-8 is handled
correctly on the Commandline Bash – but I don’t know how to check that.