On 24/07/07, Adam Meyer <meyer7(a)mindspring.com> wrote:
I have an external program for my wiki that grabs
mwt.old_text and
puts it in a text box. Any time there are special characters (E.G. (r)
(c) ยบ (registered, copyright, degrees)) a capital A with an accent mark
shows up before it. Im sure that this is just the way that it is
store in the database, or an unicode issue, but how does mediawiki
pull it with out this?
Some quick background; MediaWiki will, under the defaults, stick two
fingers up at the collation of the database and insert UTF-8 data into
a latin1 table, performing cleanup and normalisations and whatever
other Unicode goodies are required at a later stage. This is for
historical reasons, due to MySQL 4.0.x's lack of UTF-8 support, and
the poor UTF-8 support in later versions. Brion can probably give you
some more waffle about this.
I'd advise not accessing the table directly, since the old_text column
may not contain text; depending upon your configuration, it could
contain a pointer to an external store, or a pointer to another row,
or some compressed object, or compressed text. If at all possible, try
to use the MediaWiki Revision class to get at this text.
Rob Church