https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
Bug ID: 73151 Summary: Site object becomes a string MySQLPageGenerator, throws error Product: Pywikibot Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: pagegenerators Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: gpaumier@wikimedia.org Web browser: --- Mobile Platform: ---
MySQLPageGenerator says it can take a 'site' argument as either a Site object or a string (that represents the dbname). If 'site' isn't given, it defaults to the current Site.
In all cases, site ends up being a string, either because that's what was passed as an argument, or because the Site object is replaced by the dbname in 'site = site.dbName()'
However, a few lines later, site is expected to be a Site object again in 'query = query.encode(site.encoding())'.
This throws an AttributeError: 'unicode' object has no attribute 'encoding'.
https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
Guillaume Paumier gpaumier@wikimedia.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Site object becomes a |Site object becomes a |string MySQLPageGenerator, |string in |throws error |MySQLPageGenerator, throws | |error
https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
--- Comment #1 from Merlijn van Deen valhallasw@arctus.nl --- The "query.encode(site.encoding())" only makes sense if:
1) the values in the database are encoded in site.encoding, but stored in a "latin-1" [1] column as bytes (i.e. not using utf8/"utf8mb4" [2] charset/collations in mysql) 2) the communication with mysql is in latin-1 (there is no SET NAMES utf8 and character_set_client / character_set_results / character_set_connection are not set)
[1] "latin-1" as it's actually windows-1252, but MySQL calls it latin-1.
Basically, there are two pieces of relevant information: 1) what charset does mysql think the table is in (WMF: latin-1. Many other contexts: utf-8) --> run SET NAMES <XXX> for this charset 2) what charset is the data actually in (WMF: utf-8. Other contexts might use latin-1 or others) --> decode bytes we get from mysql using this charset.
https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
--- Comment #2 from Gerrit Notification Bot gerritadmin@wikimedia.org --- Change 173332 had a related patch set uploaded by Merlijn van Deen: Bug 73151: split use of 'site' and 'dbname'
https://gerrit.wikimedia.org/r/173332
https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
Gerrit Notification Bot gerritadmin@wikimedia.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |PATCH_TO_REVIEW
https://bugzilla.wikimedia.org/show_bug.cgi?id=73151
John Mark Vandenberg jayvdb@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jayvdb@gmail.com Version|unspecified |core (2.0)
pywikipedia-bugs@lists.wikimedia.org