On Fri, May 22, 2009 at 6:09 PM, O. O. olson_ot@yahoo.com wrote:
Thanks for your reply Platonides. I am trying your suggestion right now. It would take a few hours to crash – if it does. (I hope sed handles UTF-8 correctly.) I would try yesterdays pagelinks.sql later.
sed treats UTF-8 as a stream of bytes. Since the pattern won't match UTF-8 (UTF-8 only contains ASCII bytes if they represent ASCII code points), it will just ignore those bytes.
(That sed pattern is pretty horrifying and fragile, though. I'd recommend something more like: sed -i 's/^) TYPE=InnoDB;$/) TYPE=InnoDB DEFAULT CHARSET=binary;/' )
>$ mysql wikidb < enwiki-20090306-pagelinks.sql
I am using Linux (Ubuntu). My question is if the Shell which does the Pipe – would it have any effect of modifying the characters before mysql gets them. Right now I think the Shell supports UTF-8 – but I hope it is not messing things up.
The shell is only handing the mysql command a file descriptor. mysql will read the file itself directly, the shell won't touch any of the input.