Hey, I have a new topic I'd like to discuss. It's about mbstring and
whether do we really need to support running without it.
The RFC is at
https://gerrit.wikimedia.org/r/#/c/267309/
Here's a copy:
MediaWiki currently relies heavily on Unicode support to provide support
for 300+ languages yet does not require the mbstring PHP extension to
function. Instead, we create PHP-only fallbacks if a native support is not
available. This creates a few problems:
* These fallbacks are extremely slow. The script in P2734
<https://phabricator.wikimedia.org/P2734> demonstrates that fallbacks are
roughly order of magnitude slower on PHP 5.6. In extreme cases, it can be
100+ times slower, per comment in Fallback.php).
* These fallbacks cover only a few functions. If there's no fallback,
either ad-hoc solutions are used in places, or, like in SwiftFileBackend,
we just say "mbstring is required".
* This also means that extensions can't expect any consistent Unicode
support.
* Won't somebody please think of the children!
Now that we've dramatically increased PHP requirements, we've already cut
off a lot of crappy environments so this change will likely not affect too
many users.
OS support:
* On Debian-based systems, a simple apt-get install php5 gives you mbstring
by default.
* On RPM-based, a separate package is required
* On Windows, people tend to use *AMP all-in-one packages that have
mbstring.
Current mbstring usage in core (excluding fallbacks themselves):
mediawiki/includes$ grep -orEh '\bmb_\w+' . | sort | uniq -c
7 mb_check_encoding
6 mb_convert_encoding
12 mb_strlen
4 mb_substr
Some time ago, I committed
https://gerrit.wikimedia.org/r/#/c/267309/ to
start a discussion, but it went largely unnoticed so I'd like to start a
formal RFC.
--
Best regards,
Max Semenik ([[User:MaxSem]])