On Fri, May 22, 2009 at 6:09 PM, O. O. <olson_ot(a)yahoo.com> wrote:
Thanks for your reply Platonides. I am trying your
suggestion right now.
It would take a few hours to crash – if it does. (I hope sed handles
UTF-8 correctly.) I would try yesterdays pagelinks.sql later.
sed treats UTF-8 as a stream of bytes. Since the pattern won't match
UTF-8 (UTF-8 only contains ASCII bytes if they represent ASCII code
points), it will just ignore those bytes.
(That sed pattern is pretty horrifying and fragile, though. I'd
recommend something more like: sed -i 's/^) TYPE=InnoDB;$/)
TYPE=InnoDB DEFAULT CHARSET=binary;/' )
$ mysql wikidb
< enwiki-20090306-pagelinks.sql
I am using Linux (Ubuntu). My question is if the Shell which does the
Pipe – would it have any effect of modifying the characters before mysql
gets them. Right now I think the Shell supports UTF-8 – but I hope it is
not messing things up.
The shell is only handing the mysql command a file descriptor. mysql
will read the file itself directly, the shell won't touch any of the
input.