https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
--- Comment #1 from Merlijn van Deen valhallasw@arctus.nl --- The "query.encode(site.encoding())" only makes sense if:
1) the values in the database are encoded in site.encoding, but stored in a "latin-1" [1] column as bytes (i.e. not using utf8/"utf8mb4" [2] charset/collations in mysql) 2) the communication with mysql is in latin-1 (there is no SET NAMES utf8 and character_set_client / character_set_results / character_set_connection are not set)
[1] "latin-1" as it's actually windows-1252, but MySQL calls it latin-1.
Basically, there are two pieces of relevant information: 1) what charset does mysql think the table is in (WMF: latin-1. Many other contexts: utf-8) --> run SET NAMES <XXX> for this charset 2) what charset is the data actually in (WMF: utf-8. Other contexts might use latin-1 or others) --> decode bytes we get from mysql using this charset.