I'm thinking this may be a php bug rather than a mw problem - but I'm wondering how to get around it. I generate MW xml for importing pages and I use htmlentities to encode things for xml. But I just saw a problem with the XML parser failing to recognize the ± entity.
Any suggestions?
Jim
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Hu wrote:
I'm thinking this may be a php bug rather than a mw problem - but I'm wondering how to get around it. I generate MW xml for importing pages and I use htmlentities to encode things for xml. But I just saw a problem with the XML parser failing to recognize the ± entity.
± has no inherent meaning in XML; it would have to be defined via the doctype or directly in a processor directive in the document.
Instead of htmlentities(), use htmlspecialchars() which is safe for XML by only using the XML-predefined character references &, <, >, and ".
Ensure your text is properly encoded (eg, UTF-8 unless your XML file is otherwise marked.)
- -- brion
Thanks Brion,
On Jun 20, 2008, at 1:15 PM, Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Hu wrote:
I'm thinking this may be a php bug rather than a mw problem - but I'm wondering how to get around it. I generate MW xml for importing pages and I use htmlentities to encode things for xml. But I just saw a problem with the XML parser failing to recognize the ± entity.
± has no inherent meaning in XML; it would have to be defined via the doctype or directly in a processor directive in the document.
Instead of htmlentities(), use htmlspecialchars() which is safe for XML by only using the XML-predefined character references &, <, >, and ".
Done! I also did something I should have done before I posted - I put a ± in a Sandbox page and exported it to see how MW handles it... it turns into a &plusmn, which imports and converts back to the plus or minus character. Nice!
Jim
Ensure your text is properly encoded (eg, UTF-8 unless your XML file is otherwise marked.)
- -- brion
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhb89wACgkQwRnhpk1wk46knwCg1RlfJYUT18TEaG3djFCQpKDR VjkAnR9vMF0r3gWHl3B2cgcrz1RivwTE =3qsd -----END PGP SIGNATURE-----
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
mediawiki-l@lists.wikimedia.org