Hey, the way non-Latin characters are displayed in section has always been
a serious complaint from our communities:
https://phabricator.wikimedia.org/T152540
Community tech has done some work in this area and it's ready to get more
eyeballs:
https://gerrit.wikimedia.org/r/#/c/362326/
A few words about implementation plan:
* There is now a concept of primary vs. fallback IDs. Primary are used for
linking, fallbacks are used so that old links still work.
* To transition to the new system, a wiki should first continue serving
legacy-encoded sections with new encoding as a fallback, then switch the
two after all older parser/HTTP caches have been filled with new HTML.
Legacy encoding should remain enabled as long as there is a noticeable
traffic using it, on WMF sites that probably means years.
* By default, MediaWiki will still behave exactly like before. Changing the
defaults to something more modern will be discussed later, after all the
initial issues are resolved.
* Because it's being used without escaping in so many places outside of
core and because there is now a fine distinction between ID escaping for
different purposes, Sanitizer::escapeId() is deprecated. It will never
output new encoding and should be replaced with one of escapeIdForHtml(),
escapeIdForLink() or escapeIdForExternalInterwiki() AFTER making sure it's
getting properly escaped.
Your help reviewing/testing/discussing this is highly appreciated!
--
Best regards,
Max Semenik ([[User:MaxSem]])