As a data consumer, I'd prefer if columns matched between EventLogging and production DBs as closely as possible, so VARBINARY sounds like a win to me.
On Fri, Jun 13, 2014 at 5:39 AM, Nuria Ruiz nuria@wikimedia.org wrote:
However, as Nuria mentioned, consistency with Mediawiki may be safest if
we expect that client packages will continue >to have encoding challenges. Also cross-db/charset joins to be considered.
My last reply on this thread, I promise. As I have come to learn as of late (encoding in python 2.7 is a world of joy), having the right encoding in python while using sqlalchemy has a lot to do on "how" you connect to the db.
If we use VARBINARY types we also need to connect either by specifying convert_unicode=True or with connect_args={"charset" : "utf8"}
In the second case (specifying the charset) while our db types are VARBINARY the sql alchemy column types can be strings and everything is happy, this seems like the easiest solution.
On Tue, Jun 10, 2014 at 6:14 AM, Sean Pringle springle@wikimedia.org wrote:
On Tue, Jun 10, 2014 at 1:12 PM, Ori Livneh ori@wikimedia.org wrote:
...and back to utf8 as default charset The version of MySQLdb that is packaged for Precise does not know
about utf8mb4. I (inexcusably) tested against the dev branch of MySQLdb.
Bet that was a fun day :) Somewhat like today for me...
We no longer use the precise packages. m2-master supports utf8mb4 if anyone wishes to use it.
However, as Nuria mentioned, consistency with Mediawiki may be safest if we expect that client packages will continue to have encoding challenges. Also cross-db/charset joins to be considered.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics