It's years since I discovered that mysql's utf8 is broken in this way, but I can still feel the pain. What part of "universal" did they not understand? The mysql docs more or less say that "utf8" is deprecated, certainly not future-proof, and suggest you use utf8mb4. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html)
On Oct 14, 2021, at 1:17 PM, Sergey Dorofeev sergey@fidoman.ru wrote:
Thank you, did not know about it. Real UTF-8 in mysql is utf8mb4, I think it should be used here.
Sergey
Jaime Crespo писал 2021-10-14 18:32:
I agree that LOWER doesn't make much sense in binary collation.
Sadly, a utf8 (3-byte UTF-8) conversion may fail for 4-byte characters, so at the very least it should be utf8mb4 (4-byte UTF-8). I am not so familiar with ListPager to say if there could be other issues arising from that- sending a code review would be easier for better context.
On Thu, Oct 14, 2021 at 5:16 PM Sergey Dorofeev <sergey@fidoman.ru mailto:sergey@fidoman.ru> wrote: Hello,
I have got issue with ListFiles page in mediawiki 1.35.1 Filtering worked not very good, was case-sensitive and not always got text in middle of file name. I looked in DB and saw that img_name column is varbinary, but pagers/ImageListPager.php tries to do case-insensitive select with LOWERing both sides of strings. But LOWER does not work for varbinary So I think that following change will be reasonable:
--- ImageListPager.php.orig 2021-10-14 16:31:52.000000000 +0300 +++ ImageListPager.php 2021-10-14 16:00:10.127694733 +0300 @@ -90,9 +90,10 @@
if ( $nt ) { $dbr = wfGetDB( DB_REPLICA );
$this->mQueryConds[] = 'LOWER(img_name)'
.
$this->mQueryConds[] =
'LOWER(CONVERT(img_name USING utf8))' . $dbr->buildLike( $dbr->anyString(),
strtolower(
$nt->getDBkey() ), $dbr->anyString() );
mb_strtolower(
$nt->getDBkey() ), $dbr->anyString() );
} }
@@ -161,9 +162,9 @@ $nt = Title::newFromText( $this->mSearch ); if ( $nt ) { $dbr = wfGetDB( DB_REPLICA );
$conds[] = 'LOWER(' . $prefix . '_name)'
.
$conds[] = 'LOWER(CONVERT(' . $prefix .
'_name USING utf8))' . $dbr->buildLike( $dbr->anyString(),
strtolower(
$nt->getDBkey() ), $dbr->anyString() );
mb_strtolower(
$nt->getDBkey() ), $dbr->anyString() ); } }
-- Sergey _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org mailto:wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org mailto:wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Jaime Crespo <http://wikimedia.org http://wikimedia.org/>
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org mailto:wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org mailto:wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/