[Mediawiki-l] Special Characters from database.

Rob Church robchur at gmail.com
Tue Jul 24 13:39:20 UTC 2007


On 24/07/07, Adam Meyer <meyer7 at mindspring.com> wrote:
> I have an external program for my wiki that grabs mwt.old_text and
> puts it in a text box. Any time there are special characters (E.G. (r)
> (c) º (registered, copyright, degrees)) a capital A with an accent mark
> shows up before it. Im sure that this is just the way that it is
> store in the database, or an unicode issue, but how does mediawiki
> pull it with out this?

Some quick background; MediaWiki will, under the defaults, stick two
fingers up at the collation of the database and insert UTF-8 data into
a latin1 table, performing cleanup and normalisations and whatever
other Unicode goodies are required at a later stage. This is for
historical reasons, due to MySQL 4.0.x's lack of UTF-8 support, and
the poor UTF-8 support in later versions. Brion can probably give you
some more waffle about this.

I'd advise not accessing the table directly, since the old_text column
may not contain text; depending upon your configuration, it could
contain a pointer to an external store, or a pointer to another row,
or some compressed object, or compressed text. If at all possible, try
to use the MediaWiki Revision class to get at this text.


Rob Church


More information about the MediaWiki-l mailing list