>However, as Nuria mentioned, consistency with Mediawiki may be safest if we expect that client packages will continue >to have encoding challenges. Also cross-db/charset joins to be considered.

My last reply on this thread, I promise. As I have come to learn as of late (encoding in python 2.7 is a world of joy), having the right encoding in python while using sqlalchemy has a lot to do on "how" you connect to the db. 

If we use VARBINARY types we also need to connect either by specifying convert_unicode=True or with connect_args={"charset" : "utf8"} 

In the second case (specifying the charset) while our db types are VARBINARY the sql alchemy column types can be strings and everything is happy, this seems like the easiest solution.







On Tue, Jun 10, 2014 at 6:14 AM, Sean Pringle <springle@wikimedia.org> wrote:
On Tue, Jun 10, 2014 at 1:12 PM, Ori Livneh <ori@wikimedia.org> wrote:

    ...and back to utf8 as default charset

    The version of MySQLdb that is packaged for Precise does not know about
    utf8mb4. I (inexcusably) tested against the dev branch of MySQLdb.

Bet that was a fun day :)  Somewhat like today for me...

We no longer use the precise packages. m2-master supports utf8mb4 if anyone wishes to use it.

However, as Nuria mentioned, consistency with Mediawiki may be safest if we expect that client packages will continue to have encoding challenges. Also cross-db/charset joins to be considered.

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics