On 24/07/07, Adam Meyer meyer7@mindspring.com wrote:
I have an external program for my wiki that grabs mwt.old_text and puts it in a text box. Any time there are special characters (E.G. (r) (c) ยบ (registered, copyright, degrees)) a capital A with an accent mark shows up before it. Im sure that this is just the way that it is store in the database, or an unicode issue, but how does mediawiki pull it with out this?
Some quick background; MediaWiki will, under the defaults, stick two fingers up at the collation of the database and insert UTF-8 data into a latin1 table, performing cleanup and normalisations and whatever other Unicode goodies are required at a later stage. This is for historical reasons, due to MySQL 4.0.x's lack of UTF-8 support, and the poor UTF-8 support in later versions. Brion can probably give you some more waffle about this.
I'd advise not accessing the table directly, since the old_text column may not contain text; depending upon your configuration, it could contain a pointer to an external store, or a pointer to another row, or some compressed object, or compressed text. If at all possible, try to use the MediaWiki Revision class to get at this text.
Rob Church