That's the sampled logs on stat1002; you do not, under any circumstances, want to deal with those. I've been doing this for 2+ years - if I'm pointing yinz to the HDFS-stored logs there's a reason for it ;)
On 25 June 2015 at 13:41, James Douglas jdouglas@wikimedia.org wrote:
Ooh!
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequests_sampled
On Thu, Jun 25, 2015 at 10:28 AM, James Douglas jdouglas@wikimedia.org wrote:
This looks possibly relevant: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Overview
On Thu, Jun 25, 2015 at 10:03 AM, James Douglas jdouglas@wikimedia.org wrote:
The varnish logs == request logs == also in HDFS.
Ah ha, thanks!
To get access you'll need a phabricator ticket asking for stat1002 and analytics cluster access, with Ottomata CCd to make the patch and Dan CCd to confirm you need it.
Cool, I'll get on that. In the meantime, where can I learn about the infrastructure?
On Thu, Jun 25, 2015 at 10:01 AM, Oliver Keyes okeyes@wikimedia.org wrote:
The varnish logs == request logs == also in HDFS. To get access you'll need a phabricator ticket asking for stat1002 and analytics cluster access, with Ottomata CCd to make the patch and Dan CCd to confirm you need it.
On 25 June 2015 at 12:53, James Douglas jdouglas@wikimedia.org wrote:
From IRC, it sounds like this information ought to be available in the Varnish logs. What's the story there?
On Thu, Jun 25, 2015 at 9:52 AM, James Douglas jdouglas@wikimedia.org wrote:
I misspoke: we're looking for HTTP requests coming from users who are leaving the Portal, not retrieving the portal.
e.g. Clicking on enwiki, using one of the search forms, etc.
On Thu, Jun 25, 2015 at 9:50 AM, Oliver Keyes okeyes@wikimedia.org wrote: > > * Nope :( > * It's in HDFS! > > On 25 June 2015 at 12:05, James Douglas jdouglas@wikimedia.org > wrote: > > Let's say, hypothetically, that I wanted to measure information > > about > > HTTP > > requests coming into the Wikipedia Portal (www.wikipedia.org). > > > > * Do we record this information? > > * If so, is it accessible via analytical tools? > > * If so, how do I get my mitts on it? > > * If not, is it accessible from a database or similar? > > > > Context: https://phabricator.wikimedia.org/T100673 > > > > _______________________________________________ > > Wikimedia-search mailing list > > Wikimedia-search@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Wikimedia-search mailing list > Wikimedia-search@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search