Thanks Brion,
I remember hearing some of this in earlier threads on the list, but at
the time I didn't pay enough attention. This helps.
Jim
On Feb 27, 2008, at 12:50 PM, Brion Vibber wrote:
Jim Hu wrote:
On Feb 22, 2008, at 4:29 PM, Platonides wrote:
Your intermediate db is showing it as latin1
isntead of treating
it as
utf8. As far as it isn't mangled you can work with them, or perhaps
conveniently translate the 'separator'.
My brain always fogs over with encoding issues. I see that
page_title
in the MW schema I'm using is latin1_bin, but it handles the arrow.
The page_title in my intermediate db is utf8_bin.
The default MediaWiki table schema on MySQL is to totally ignore the
defined character set encoding and just pass UTF-8 data in raw,
which is
compatible with how MySQL 4.0 works.
When using MySQL 4.1 or higher, the ideal thing *in theory* is to
set up
your MediaWiki database in the binary schema (for best compatibility
--
complete UTF-8 data range availability)... or you might try the UTF-8
schema (might be more convenient for sharing with other databases, but
will fail on Unicode characters outside the Basic Multilingual Plane
and
may fail with some not-quite-valid-Unicode constructs that happen
every
once in a while).
In some older versions of MediaWiki, the UTF-8 and binary schemas
would
end up breaking when you applied updates; they should work correctly
from here out, though... hopefully. :)
Unfortunately we don't yet have a helper script for switching your
database from the classic charset-agnostic mode (which often does
"interesting" things to code which does know about the charsets) to a
cleanly-marked UTF-8 or binary schema.
I keep meaning to get around to this, so maybe we'll get it in in a
1.12
point release or something... but don't hold your breath. ;)
-- brion vibber (brion @
wikimedia.org)
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054