Hi!
Don't have a good solution, but some ideas:
1. There's http://php.net/manual/en/class.uconverter.php which uses ICU convertor. It can recognize tons of charsets/encodings (http://site.icu-project.org/charts/charset) and can filter out bad characters, though the way to achieve it may be a bit tricky. E.g.:
<?php class MyConverter extends UConverter { function toUCallback( $reason , $source , $codeUnits , &$error) { $error = 0; return null; } } $c = new MyConverter("UTF-8", "utf-8"); var_dump($c->convert("aa\xC3\xC3\xC3\xB8aa"));
(there might be a better way, it's just quick-n-dirty example). Con: while it's supported by hhvm, it's PHP 5.5+. Can be backported probably quite easily.
2. recode - http://php.net/manual/en/ref.recode.php. This: var_dump( recode("UTF-8..UTF-16,UTF16..UTF-8", "aa\xC3\xC3\xC3\xB8aa") ); seems to work fine. Not sure how is recode support for hhvm.
3. Patch libmbfl to add more aliases and missing encodings. Shouldn't be very hard though I'm not sure what is the policy about patching PHP/HHVM here.
4. Implement ezyang@php.net's suggestion at working around the glibs mess by adopting code from http://code.woboq.org/userspace/glibc/iconv/iconv_prog.c.html. Again, that would require custom patch for PHP, not sure about hhvm.