-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Reid Priedhorsky wrote:
In our ongoing research here at UMN, we've
discovered some reverts that
introduce apparent character set problems; what seems to happen is that
some Unicode characters are replaced by a character I don't recognize
followed by a hexadecimal number. For example:
http://en.wikipedia.org/w/index.php?title=Dog&diff=58851026&oldid=5…
What I see is that a sequence of five characters that I don't have
glyphs for, which show up as five boxes with the numbers "010337 01033F
01033D 010333 010343" in them, is replaced with the sequence
"?df37?df3f?df3d?df33?df43", where ? is not the question mark but a
black diamond with a white question mark in it (a zero byte?).
That would appear to be a bug in whatever bot tool was used to make the
reversion last year.
The gothic characters are outside of Unicode's BMP (Basic Multilingual
Plane), the first 16-bit subset of Unicode which is most widely
supported. The tool appears to have had trouble either decoding or
re-encoding them.
- -- brion vibber (brion @
wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iD8DBQFGVH+iwRnhpk1wk44RAm5YAKC7M8Gaq1+kGr/MMH6WIHSpACFhBQCePteV
5Jn1fTB5WQMXnNTHNOmVFl0=
=BVgy
-----END PGP SIGNATURE-----