-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Reid Priedhorsky wrote:
In our ongoing research here at UMN, we've discovered some reverts that introduce apparent character set problems; what seems to happen is that some Unicode characters are replaced by a character I don't recognize followed by a hexadecimal number. For example:
http://en.wikipedia.org/w/index.php?title=Dog&diff=58851026&oldid=58...
What I see is that a sequence of five characters that I don't have glyphs for, which show up as five boxes with the numbers "010337 01033F 01033D 010333 010343" in them, is replaced with the sequence "?df37?df3f?df3d?df33?df43", where ? is not the question mark but a black diamond with a white question mark in it (a zero byte?).
That would appear to be a bug in whatever bot tool was used to make the reversion last year.
The gothic characters are outside of Unicode's BMP (Basic Multilingual Plane), the first 16-bit subset of Unicode which is most widely supported. The tool appears to have had trouble either decoding or re-encoding them.
- -- brion vibber (brion @ wikimedia.org)