Re: [Wikitech-l] Historical use of latin1 fields in MySQL

2 May 2017


      On 03/05/17 03:10, Mark Clements (HappyDog) wrote:
...
Can anyone confirm that MediaWiki used to behave in this manner, and
if so why?
In MySQL 4.0, MySQL didn't really have character sets, it only had
collations. Text was stored as 8-bit clean binary, and was only
interpreted as a character sequence when compared to other text fields
for collation purposes. There was no UTF-8 collation, so we stored
UTF-8 text in text fields with the default (latin1) collation.
...
If it was due to MySQL bugs, does anyone know in what version these
were fixed?
IIRC it was fixed in MySQL 4.1 with the introduction of proper
character sets.
To migrate such a database, you need to do an ALTER TABLE to switch
the relevant fields from latin1 to the "binary" character set. If you
ALTER TABLE directly to utf8, you'll end up with "mojibake", since the
text will be incorrectly interpreted as latin1 and converted to
unicode. This is unrecoverable, you have to restore from a backup if
this happens.
I think it is possible to then do an ALTER TABLE to switch from binary
to utf8, but it's been a while since I tested that.
-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Historical use of latin1 fields in MySQL