Platonides wrote:
O. O. wrote:
I then attempted to edit the SQL File i.e. replace the line
) TYPE=InnoDB;
With
) TYPE=InnoDB DEFAULT CHARSET=binary;
This works, in the sense that now the new Table gets created in Binary. However I think I am making mistakes in editing the file. These files are rather large, so I wrote code in Perl, and again in Java to do the editing. They can manage to do the above substitution, but I am not entirely confident about their UTF-8 handling.
You can also use sed to edit it: $ sed -i "n;n;n;n;n;n;n;n;n;n;n;n;n;n;n;n;n;s/InnoDB/InnoDB DEFAULT CHARSET=binary/" enwiki-20090306-pagelinks.sql
Thanks for your reply Platonides. I am trying your suggestion right now. It would take a few hours to crash – if it does. (I hope sed handles UTF-8 correctly.) I would try yesterdays pagelinks.sql later.
So assume if I do make the change as you suggested above i.e. specifically set the “DEFAULT CHARSET” to binary, would there be any problems importing using
$ mysql wikidb < enwiki-20090306-pagelinks.sql
I am using Linux (Ubuntu). My question is if the Shell which does the Pipe – would it have any effect of modifying the characters before mysql gets them. Right now I think the Shell supports UTF-8 – but I hope it is not messing things up.
Thanks a lot. O.O.