(Moving this thread to Analytics list.)
I just finished a discussion with David and Diederik about the log format for this thing. Here's what we got right now:
* Request path * Query params * HTTP host (aka request hostname) * Timestamp * Client IP (aka remote address/host) * X-Forwarded-For * Referer * Accept-Language * Cookie * X-WAP-Profile * User-Agent * Server-Hostname * Sequence Number
The corresponding varnishncsa log format string is:
'%U %q %{Host}i %t %h %{X-Forwarded-For}i %{Referer}i %{Accept-Language}i %{Cookie}i %{X-WAP-Profile}i %{User-agent}i %l %n'
(Note the literal tabs in that string. varnishncsa doesn't translate "\t", afaict.)
I've tested this on my log1 labs instance via curl + varnish + varnishncsa.
This curl command:
curl --cookie 'uid=deadbeef; pageload_id=3;' -H "x-wap-profile: http://nds1.nds.nokia.com/uaprof/N6230ir200.xml" -H "X-Forwarded-For: 192.168.0.123" -H "Referer: http://www.google.com" -H "Accept-Language: en-US" "http://localhost:6081/event/e3?lol=dongs&foo(bar/baz)=*&this=<that/>"
Results in this log line:
/event/e3 ?lol=dongs&foo(bar/baz)=*&this=<that/> localhost:6081 2012-11-05T21:24:15 127.0.0.1 192.168.0.123 http://www.google.com en-US uid=deadbeef;%20pageload_id=3; http://nds1.nds.nokia.com/uaprof/N6230ir200.xml curl/7.19.7%20(x86_64-pc-linux-gnu)%20libcurl/7.19.7%20OpenSSL/0.9.8k%20zlib/1.2.3.3%20libidn/1.15 i-00000239.pmtpa.wmflabs 8
Note that fields (like User-Agent) are URL encoded, whereas the query params are not.
Ori and others, thoughts thus far? If we are fine with this, Asher can move forward with making this stream available.
Also, I think we are also still waiting on this RT ticket, right? https://rt.wikimedia.org/Ticket/Display.html?id=3760
-Ao
On Oct 31, 2012, at 4:58 PM, Andrew Otto otto@wikimedia.org wrote:
Hi guys!
I wanted to write an email to summarize some of the chats I just had with a few of you. We were all talking about how to set up a single /event data stream from varnish that we could all share. Here's what we got:
Asher will set up varnish to match for "^/event/.*". Any request that matches this will return a 204 response. A varnishncsa instance will then log this event to a shared stream.
The URL will be expected to contain a product_code, as in /event/<product_code>. Consumers of this stream can filter out their relevant events by matching against their product code. The URL and query params will be the first fields in the each generated event, to allow for easy filtering. The rest of the log line will contain useful request data (client IPs, hostnames, seq numbers, etc.). We're still working out the exact log format, but it will contain all of the data that E3 needs, plus more that other consumers will find useful. Here's a preliminary list of fields:
- URL (not including requested hostname. e.g. /event/<product_code>/ )
- Query params
- Timestamp
- Client IP (aka remote host)
- X-Forwarded-For
- Referer
- Server Hostname
- Sequence number
- Request service time in ms
- Accept-Language (?)
- Cookie (?)
- User-Agent
Obviously, this format still needs some work. We'll talk more about this tomorrow, so if you've got thoughts let us know.
Thanks to all for chatting with me and working this out today! Asher, I will get you a varnishncsa format string soon.
-AO
P.S. Apologies if this email is rambley, I did not proofread it. Ok byeeeeee I gotta go move a piano!