I have an external program for my wiki that grabs mwt.old_text and puts it in a text box. Any time there are special characters (E.G. ® © º (registered, copyright, degrees)) a capital A with an accent mark shows up before it. Im sure that this is just the way that it is store in the database, or an unicode issue, but how does mediawiki pull it with out this?
You can see the problem here: (may be broken in IE) http://www.risdpedia.net/interface/easy_edit/index.php?title=Rust- Oleum_Specialty_High_Heat
Thanks, -Adam
On 24/07/07, Adam Meyer meyer7@mindspring.com wrote:
I have an external program for my wiki that grabs mwt.old_text and puts it in a text box. Any time there are special characters (E.G. (r) (c) º (registered, copyright, degrees)) a capital A with an accent mark shows up before it. Im sure that this is just the way that it is store in the database, or an unicode issue, but how does mediawiki pull it with out this?
Some quick background; MediaWiki will, under the defaults, stick two fingers up at the collation of the database and insert UTF-8 data into a latin1 table, performing cleanup and normalisations and whatever other Unicode goodies are required at a later stage. This is for historical reasons, due to MySQL 4.0.x's lack of UTF-8 support, and the poor UTF-8 support in later versions. Brion can probably give you some more waffle about this.
I'd advise not accessing the table directly, since the old_text column may not contain text; depending upon your configuration, it could contain a pointer to an external store, or a pointer to another row, or some compressed object, or compressed text. If at all possible, try to use the MediaWiki Revision class to get at this text.
Rob Church
Adam Meyer wrote:
I have an external program for my wiki that grabs mwt.old_text and puts it in a text box. Any time there are special characters (E.G. ® © º (registered, copyright, degrees)) a capital A with an accent mark shows up before it. Im sure that this is just the way that it is store in the database, or an unicode issue, but how does mediawiki pull it with out this?
You can see the problem here: (may be broken in IE) http://www.risdpedia.net/interface/easy_edit/index.php?title=Rust- Oleum_Specialty_High_Heat
Thanks, -Adam
This is because they're in UTF-8. If you convert from UTF-8 to iso-8869 you'll get the right symbols. Seems you hit some special cases where the second character matches the ascii one. Would it be another character, like á you would find á but that á and à are both As is just by luck.
mediawiki-l@lists.wikimedia.org