MediaWiki currently doesn't even try to support
UTF-8
I thought the installer gave the option to chose between binary and utf8
83-bytes)? It is ok if we support UTF-8 thought binary fields + custom
library collations, but I think the sane approach would be to either move
everthing to binary or we support the most complete collation, not the
confusing combination of the 2. Note I don't need utf8mb4 to be enabled, I
just want Mediawiki to work out of the box on any MySQL or MariaDB versions
supported, including the latest 2 of each one- even if that means doing
some workarounds.
While it's not actually part of "strict
mode"
It is not- we can delay the group by change until mariadb supports it
properly according to sql standard and we do not support any older database
version behaviour. However, strict mode ("don't add corrupt data") is
available on all versions and the default in the latest ones, and it should
be enabled at least on testing environments.
innodb_large_preffix cannot be enabled anymore because it is enabled
(hardcoded) automatically on MySQL 8.0.
On Mon, Apr 30, 2018 at 4:40 PM, Brad Jorsch (Anomie) <bjorsch(a)wikimedia.org
wrote:
> On Mon, Apr 30, 2018 at 9:05 AM, Jaime Crespo <jcrespo(a)wikimedia.org>
wrote:
>
> > * Support "real" (4-byte) UTF-8: utf8mb4 in MySQL/MariaDB (default in
the
> > latest versions) and start deprecating "fake" (3-byte) UTF-8: utf8
> >
>
> MediaWiki currently doesn't even try to support UTF-8 in MySQL. The core
> MySQL schema specifically uses "varbinary" and "blob" types for
almost
> everything.
>
> Ideally we'd change that, but see below.
>
>
> > * Check code works as intended in "strict" mode (default in the
latest
> > versions), at least regarding testing
> >
>
> While it's not actually part of "strict mode" (I think), I note that
> MariaDB 10.1.32 (tested on db1114) with ONLY_FULL_GROUP_BY still seems to
> have the issues described in
>
https://phabricator.wikimedia.org/T108255#2415773.
>
>
> > Anomie- I think you were thinking on (maybe?) abstracting schema for
> > mediawiki- fixing the duality of binary (defining sizes in bytes) vs.
> UTF-8
> > (defining sizes in characters) would be an interesting problem to solve-
> > the duality is ok, what I mean is being able to store radically different
> > size of contents based on that setting.
> >
>
> That would be an interesting problem to solve, but doing so may be
> difficult. We have a number of fields that are currently defined as
> varbinary(255) and are fully indexed (i.e. not using a prefix).
>
> - Just changing them to varchar(255) using utf8mb4 makes the index
> exceed MySQL's column length limit.
> - Changing them to varchar(191) to keep within the length limit breaks
> content in primarily-ASCII languages that is taking advantage of the
> existing 255-byte limit to store more than 191 codepoints.
> - Using a prefixed index makes ORDER BY on the column filesort.
> - Or the column length limit can be raised if your installation jumps
> through some hoops, which seem to be the default in 5.7.7 but not
> before:
> innodb_large_prefix
> <https://dev.mysql.com/doc/refman/5.7/en/innodb-
> parameters.html#sysvar_innodb_large_prefix>
> set to ON, innodb_file_format
> <https://dev.mysql.com/doc/refman/5.7/en/innodb-
> parameters.html#sysvar_innodb_file_format>
> set to "Barracuda", innodb_file_per_table
> <https://dev.mysql.com/doc/refman/5.7/en/innodb-
> parameters.html#sysvar_innodb_file_per_table>
> set to ON, and tables created with ROW_FORMAT=DYNAMIC or COMPRESSED. I
> don't know what MariaDB might have as defaults or requirements in which
> versions.
>
> The ideal, I suppose, would be to require those hoops be jumped through in
> order for utf8mb4 mode to be enabled. Then a lot of code in MediaWiki would
> have to vary based on that mode flag to enforce limits on bytes versus
> codepoints.
>
> BTW, for anyone reading this who's interested, the task for that schema
> abstraction idea is
https://phabricator.wikimedia.org/T191231.
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
--
Jaime Crespo
<http://wikimedia.org>