I've decided to send independent streams of event log data from the bits servers, at least for the time being.  One directly to the analytics cluster via their public ip'd frontend server, from whence it may be multicasted at analytic's discretion, the other will continue as-is (direct to vanadium, or via oxygen to vanadium if in esams). 

I'm going to configure the analytics stream with the log format defined by Andrew below.  Ori, this allows the log format to vanadium to remain unchanged if you'd like.  Let me know if you'd like the same format as analytics, or to stick with what already in place. 

On Mon, Nov 5, 2012 at 1:52 PM, Andrew Otto <otto@wikimedia.org> wrote:
(Moving this thread to Analytics list.)



I just finished a discussion with David and Diederik about the log format for this thing.  Here's what we got right now:

 * Request path
 * Query params
 * HTTP host (aka request hostname)
 * Timestamp
 * Client IP  (aka remote address/host)
 * X-Forwarded-For
 * Referer
 * Accept-Language
 * Cookie
 * X-WAP-Profile
 * User-Agent
 * Server-Hostname
 * Sequence Number

The corresponding varnishncsa log format string is:

'%U %q  %{Host}i        %t      %h      %{X-Forwarded-For}i     %{Referer}i     %{Accept-Language}i     %{Cookie}i      %{X-WAP-Profile}i       %{User-agent}i  %l      %n'

(Note the literal tabs in that string.  varnishncsa doesn't translate "\t", afaict.)

I've tested this on my log1 labs instance via curl + varnish + varnishncsa.

This curl command:

  curl  --cookie 'uid=deadbeef; pageload_id=3;' -H "x-wap-profile: http://nds1.nds.nokia.com/uaprof/N6230ir200.xml" -H "X-Forwarded-For: 192.168.0.123" -H "Referer: http://www.google.com" -H "Accept-Language: en-US"  "http://localhost:6081/event/e3?lol=dongs&foo(bar/baz)=*&this=<that/>"


Results in this log line:

  /event/e3 ?lol=dongs&foo(bar/baz)=*&this=<that/>      localhost:6081  2012-11-05T21:24:15     127.0.0.1       192.168.0.123   http://www.google.com   en-US   uid=deadbeef;%20pageload_id=3;  http://nds1.nds.nokia.com/uaprof/N6230ir200.xml curl/7.19.7%20(x86_64-pc-linux-gnu)%20libcurl/7.19.7%20OpenSSL/0.9.8k%20zlib/1.2.3.3%20libidn/1.15      i-00000239.pmtpa.wmflabs        8

Note that fields (like User-Agent) are URL encoded, whereas the query params are not.

Ori and others, thoughts thus far?  If we are fine with this, Asher can move forward with making this stream available.

Also, I think we are also still waiting on this RT ticket, right?
https://rt.wikimedia.org/Ticket/Display.html?id=3760


-Ao










On Oct 31, 2012, at 4:58 PM, Andrew Otto <otto@wikimedia.org> wrote:

> Hi guys!
>
> I wanted to write an email to summarize some of the chats I just had with a few of you.  We were all talking about how to set up a single /event data stream from varnish that we could all share.  Here's what we got:
>
> Asher will set up varnish to match for "^/event/.*".  Any request that matches this will return a 204 response.  A varnishncsa instance will then log this event to a shared stream.
>
> The URL will be expected to contain a product_code, as in /event/<product_code>. Consumers of this stream can filter out their relevant events by matching against their product code.  The URL and query params will be the first fields in the each generated event, to allow for easy filtering.  The rest of the log line will contain useful request data (client IPs, hostnames, seq numbers, etc.).  We're still working out the exact log format, but it will contain all of the data that E3 needs, plus more that other consumers will find useful.  Here's a preliminary list of fields:
>
> * URL (not including requested hostname. e.g. /event/<product_code>/ )
> * Query params
> * Timestamp
> * Client IP  (aka remote host)
> * X-Forwarded-For
> * Referer
> * Server Hostname
> * Sequence number
> * Request service time in ms
> * Accept-Language (?)
> * Cookie (?)
> * User-Agent
>
> Obviously, this format still needs some work.  We'll talk more about this tomorrow, so if you've got thoughts let us know.
>
> Thanks to all for chatting with me and working this out today!  Asher, I will get you a varnishncsa format string soon.
>
> -AO
>
>
>
>
> P.S.  Apologies if this email is rambley, I did not proofread it.  Ok byeeeeee I gotta go move a piano!
>
>
>