[Mediawiki-l] xml import parse error on plusmn entity

Jim Hu jimhu at tamu.edu
Mon Jun 23 15:52:56 UTC 2008


Thanks Brion,

On Jun 20, 2008, at 1:15 PM, Brion Vibber wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jim Hu wrote:
>> I'm thinking this may be a php bug rather than a mw problem - but I'm
>> wondering how to get around it.  I generate MW xml for importing  
>> pages
>> and I use htmlentities to encode things for xml.  But I just saw a
>> problem with the XML parser failing to recognize the ± entity.
>
> ± has no inherent meaning in XML; it would have to be defined  
> via
> the doctype or directly in a processor directive in the document.
>
> Instead of htmlentities(), use htmlspecialchars() which is safe for  
> XML
> by only using the XML-predefined character references &, <,  
> >,
> and ".

Done!  I also did something I should have done before I posted - I put  
a ± in a Sandbox page and exported it to see how MW handles  
it... it turns into a &plusmn, which imports and converts back to  
the plus or minus character.  Nice!

Jim

>
>
> Ensure your text is properly encoded (eg, UTF-8 unless your XML file  
> is
> otherwise marked.)
>
> - -- brion
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkhb89wACgkQwRnhpk1wk46knwCg1RlfJYUT18TEaG3djFCQpKDR
> VjkAnR9vMF0r3gWHl3B2cgcrz1RivwTE
> =3qsd
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054




More information about the MediaWiki-l mailing list