>Just to narrow this down a little further from the DB server-side: the eventlogging tables do use utf-8, so the fix probably doesn't require laborious schema changes (if that's what you meant by changing database types).
To follow the structure on mediawiki I think the easiest is to change db types from varchar to varbinary where utf-8 is being used. Please let us know if you do not think it isĀ appropriate.





On Mon, Jun 9, 2014 at 8:13 AM, Sean Pringle <springle@wikimedia.org> wrote:
On Fri, Jun 6, 2014 at 6:30 PM, Nuria Ruiz <nuria@wikimedia.org> wrote:

Encoding in python2 is a notorious pain and hard to get right so to fixing this will mean not just "restoring" records from logs but also it involves changing database connection args, bindings and database types.

Just to narrow this down a little further from the DB server-side: the eventlogging tables do use utf-8, so the fix probably doesn't require laborious schema changes (if that's what you meant by changing database types).

BR
Sean


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics