Woah! Nice :D How are definitions updates handled?

On 10 October 2014 18:58, Nuria Ruiz <nuria@wikimedia.org> wrote:
>1. A UDF for ua-parser or whatever we decide to use (this will possibly be necessary for pageviews, but not necessarily - it depends on our >spider/automaton detection strategy)
We got this one ready today: https://gerrit.wikimedia.org/r/#/c/166142/




On Fri, Oct 10, 2014 at 3:55 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:


On 10 October 2014 16:02, Nuria Ruiz <nuria@wikimedia.org> wrote:
>At some point I believe we hope to just, you know. Have a regularly updated browser matrix somewhere.
I REALLY think this should make it into our goals, if it cannot be done this quarter it should for sure be done this quarter.


I agree it would be nice. It's one of those things that will either come as a side-effect of other stuff, OR require subsantially more work, and nothing in-between. Things we need for it:

1. A UDF for ua-parser or whatever we decide to use (this will possibly be necessary for pageviews, but not necessarily - it depends on our spider/automaton detection strategy)
2. Pageviews data
3. A table somewhere.

Take 1, apply to 2, stick in 3. Maybe grab the same data for text/html requests overall (depends on query runtime), maybe don't.

The ideal implementation, obviously, is to pair this up with a site that automatically parses the results into HTML. That should be the end goal. but in terms of engineering support we can get most of the way there simply by ensuring we always have a recent snapshot to hand. I can probably put something together over the sampled logs and throw it in SQL if there are urgent needs.
 
Do we not have more recent data than May?

We don't, but thanks to the utilities library I built, the code for generating it would literally run:

library(WMUtils)
uas <- as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent))

uas <- uas[,j = list(requests = .N, by = c("os","browser")]

write.table(uas, file = uas_for_jon.tsv, sep = "\t", row.names = FALSE, quote = TRUE)

...assuming we didn't care about readability.

Point is, in the time until we have the new parser built into Hadoop and that setup, we can totally generate interim data from the sampled logs using the same parser at a tiny cost in research/programming time, iff (the mathematical if) we need it enough that we're cool with the sampling, and people can convince [[Dario|Our Great Leader]] to authorise me to spend 15 minutes of my time on it.



On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Email Dario and I, if he prioritises it I'll run a check on more recent data.

At some point I believe we hope to just, you know. Have a regularly updated browser matrix somewhere. This comes some time after pageviews though.

On 10 October 2014 14:38, Toby Negrin <tnegrin@wikimedia.org> wrote:
Hi Jon -- I'm sure other folks will have more information but here's a link to a slide with some data from May[1]. We don't see a lot of Windows phone traffic.

-Toby


On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson <jrobson@wikimedia.org> wrote:
I was going through our backlog again today, and I noticed a bug about
supporting editing on Windows Phones with IE9 [1]

Yet again, I wondered 'how many of our users are using IE9' as I
wondered if because of this lack of support we are losing out on lots
of potential editors.

What's the easiest way to get this information now? Is it available?

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=55599

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics





--
Oliver Keyes
Research Analyst
Wikimedia Foundation




--
Oliver Keyes
Research Analyst
Wikimedia Foundation