MediaWiki uses php's mb_strtoupper.
I believe this will use normal unicode uppercase algorithm. However this can vary depending on version of unicode. We are currently in the process of switching to php7, but for the moment we are still using HHVM's uppercasing code. There's a list of differences between hhvm and php7.2 uppercasing at https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-con... [All this is probably subject to change]
However, I am at a loss as to why hhvm & php < 5.6 [1] wouldn't map that character, since the ɽ -> Ɽ mapping has been present since unicode 5 (2006). Guess it was using a really old unicode data or something.
See also bug T219279 [2]
-- Brian
[1] https://3v4l.org/GHt3b [2] https://phabricator.wikimedia.org/T219279
On Sat, Aug 3, 2019 at 7:57 AM Nicolas Vervelle nvervelle@gmail.com wrote:
Hello,
On most wikis, MediaWiki is configuration to convert the first letter of a title to uppercase, but apparently it's not converting every Unicode characters : for example, on frwiki ɽ https://fr.wikipedia.org/w/index.php?title=%C9%BD&redirect=no is a different article than Ɽ https://fr.wikipedia.org/wiki/%E2%B1%A4, even if the second character is the uppercase version of the first one in Unicode.
So, what characters are actually converted to uppercase by the title normalization ?
I need to know this information to stop reporting some false positives in WPCleaner https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:WPCleaner.
Thanks, Nico _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l