I made a patch [0] for T39665 [1] about 6 months ago. It has been
rotting in gerrit since.
The core bug is related to glibc's iconv implementation and PHP (and
HHVM as well I think). To work around the iconv bug I wrote a little
helper function that will use mb_convert_encoding() instead if it is
present. in review PleaseStand pointed out that the libmbfl used by
mb_convert_encoding has some differences in the supported character
sets and character set naming [2] vs iconv.
I was hoping that someone on this list could step in and either
convince me to abandon this patch and pretend I never investigated the
problem or help design a solution that will plaster over these
differences in a reasonable way.
[0]:
https://gerrit.wikimedia.org/r/#/c/172101/
[1]:
https://phabricator.wikimedia.org/T39665
[2]:
https://php.net/manual/en/mbstring.encodings.php
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855