We have a number of things that process the UDP logs for analytics on banners and landing pages.  I don't think we are talking about weeks of effort to fix them, but we don't even have days at this point.  Right now, everything is working to the best extent that we can make it.  I believe the "havoc" that Katie is speaking about is changing things around 13 days before the full on launch of the fundraiser, especially when next week is Thanksgiving.  With the possibility of little bugs in half a dozen scripts, that could be a massive headache.




On Tue, Nov 13, 2012 at 1:35 PM, Diederik van Liere <dvanliere@wikimedia.org> wrote:
Hi Katie,
Could you please give a bit more details regarding  "significant amount of havoc with various crucial systems we have in place".
Thanks!
Diederik


On Tue, Nov 13, 2012 at 3:58 PM, Katie Horn <khorn@wikimedia.org> wrote:
Hi Andrew,

We are almost completely sure that this set of changes would cause a significant amount of havoc with various crucial systems we have in place, and we definitely don't have time to shake bugs out of those systems at this point, as they are all already in heavy use. The only good time to deploy those changes in the forseeable future would be (approximately) January.

Sorry about the bad news,
-Katie



On Tue, Nov 13, 2012 at 12:10 PM, Andrew Otto <otto@wikimedia.org> wrote:
Hi guys!

So, we've had a Todo on our list for a while now to make a couple of tweaks to the web access log format coming from squid, varnish and nginx.  

1. Append Accept-Language and X-Carrier headers.
This brings the field count from 14 up to 16.  udp-filter has already been modified to handle this.  I've already got a change in for this:  https://gerrit.wikimedia.org/r/#/c/12188/

2.  Change field separator from space to tab.
User-Agent and Content-Type headers (and possibly others) sometimes contain spaces.  Some sources (e.g. varnish) properly URL encode the fields before they are sent out, but others don't.  Using tab as the field separator in web access logs will avoid many of these issues.

We have wanted to do this for a while, but haven't because we were worried about breaking Erik Zachte's wikistats scripts.  Stefan Petrea is now working with Diederik on wikistats (and other things), and has dealt with this issue.  So!  We are ready!  We'd like to make this change before we start real consumption of the web access logs into the Kraken cluster, which hopefully will be relatively soon.  

Would these changes cause Fundraising any foreseeable problems?  Can we go ahead and work with ops to push this through?

Thanks!
-Andrew Otto





_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics





--

Peter Gehres

Fundraiser Production Manager
Wikimedia Foundation