Thanks Brion,
On Jun 20, 2008, at 1:15 PM, Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jim Hu wrote:
I'm thinking this may be a php bug rather
than a mw problem - but I'm
wondering how to get around it. I generate MW xml for importing
pages
and I use htmlentities to encode things for xml. But I just saw a
problem with the XML parser failing to recognize the ± entity.
± has no inherent meaning in XML; it would have to be defined
via
the doctype or directly in a processor directive in the document.
Instead of htmlentities(), use htmlspecialchars() which is safe for
XML
by only using the XML-predefined character references &, <,
>,
and ".
Done! I also did something I should have done before I posted - I put
a ± in a Sandbox page and exported it to see how MW handles
it... it turns into a &plusmn, which imports and converts back to
the plus or minus character. Nice!
Jim
Ensure your text is properly encoded (eg, UTF-8 unless your XML file
is
otherwise marked.)
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkhb89wACgkQwRnhpk1wk46knwCg1RlfJYUT18TEaG3djFCQpKDR
VjkAnR9vMF0r3gWHl3B2cgcrz1RivwTE
=3qsd
-----END PGP SIGNATURE-----
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054