On Wed, Oct 15, 2014 at 12:12 PM, Andrew Otto aotto@wikimedia.org wrote:
Jon,
Recent unsampled webrequest logs are available for querying in Hive now!
https://wikitech.wikimedia.org/wiki/Analytics/Cluster
:)
If you don’t already have access for this, submit an RT request to get access to stat1002 and the analytics-privatedata-group.
That's good to know. Thanks. I'm not sure if I have stat1002 access but every time you mention RT I shudder ;-)
Thanks for the dump of data Nuria. I assume these all add up to 100% (roughly) and are global? So if I understand correctly, if I get the above access and follow your instructions I can get this data when I do need it until we have some nice page I can go to to retrieve it :).
This is good to know when we have these sort of questions so thanks a bunch. We are currently interested in phablet traffic (big screen mobile devices) so this should be useful information for us thanks!
On Thu, Oct 16, 2014 at 7:15 PM, Nuria Ruiz nuria@wikimedia.org wrote:
And I have no idea what our traffic for Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of our traffic.
So the answer to this question (with preliminary data) is that neither 2.1 nor 2.2 amount to 0.05% of traffic to the mobile site.
I have attached the list of user agents and devices (with percentages) for the last 30 days. I did not included any device/browser combo with less than 0.05% of traffic.
For about 4% of traffic we could not identify the browser, this might be cause the user agent was not there or because ua-parser could not figure it out, I understand this is not ideal but I am sending this cause I feel this list provides quite a bit of value and should help you triage bugs.
iOS takes the cake which does not cease to amaze me.
I described what I did to gather the data here (anyone with permits to 1002 can repro): https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF
On Wed, Oct 15, 2014 at 12:15 PM, Nuria Ruiz nuria@wikimedia.org wrote:
And I have no idea what our traffic for Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of our traffic.
Understood, it is hard for you guys to work without knowing this data. I will try to get a user agent list for data from last month but, as I mentioned earlier, I think providing this data in a regular basis (monthly?) is a good goal for us.
On Wed, Oct 15, 2014 at 10:35 AM, Jon Robson jrobson@wikimedia.org wrote:
Anything would be useful. I just hit this situation again. I was reviewing some code and someone used JSON.stringify - this is not available in Android < 2.3 and I have no idea what our traffic for Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of our traffic.
In the mean time while I don't have a fancy place to find out the answers to this how can I get these answers? Should I mail the analytics mailing list to ask these questions? Cc a point person on bugzilla with the question? Ping someone privately?
Jon
On Tue, Oct 14, 2014 at 10:30 AM, Nuria Ruiz nuria@wikimedia.org wrote:
Woah! Nice :D How are definitions updates handled?
Since we talked about this on IRC, restating here to keep the archives happy. We pull the ua parser jar from our archiva depot, an update will involve building a new jar, uploading it to archiva and updating our dependency file (pom.xml) to point to the newly updated version.
On Fri, Oct 10, 2014 at 9:59 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Woah! Nice :D How are definitions updates handled?
On 10 October 2014 18:58, Nuria Ruiz nuria@wikimedia.org wrote:
>1. A UDF for ua-parser or whatever we decide to use (this will > possibly > be necessary for pageviews, but not necessarily - it depends on our > >spider/automaton detection strategy) We got this one ready today: https://gerrit.wikimedia.org/r/#/c/166142/
On Fri, Oct 10, 2014 at 3:55 PM, Oliver Keyes okeyes@wikimedia.org wrote: > > > > On 10 October 2014 16:02, Nuria Ruiz nuria@wikimedia.org wrote: >> >> >At some point I believe we hope to just, you know. Have a >> > regularly >> > updated browser matrix somewhere. >> I REALLY think this should make it into our goals, if it cannot be >> done >> this quarter it should for sure be done this quarter. >> > > I agree it would be nice. It's one of those things that will either > come > as a side-effect of other stuff, OR require subsantially more work, > and > nothing in-between. Things we need for it: > > 1. A UDF for ua-parser or whatever we decide to use (this will > possibly > be necessary for pageviews, but not necessarily - it depends on our > spider/automaton detection strategy) > 2. Pageviews data > 3. A table somewhere. > > Take 1, apply to 2, stick in 3. Maybe grab the same data for > text/html > requests overall (depends on query runtime), maybe don't. > > The ideal implementation, obviously, is to pair this up with a site > that > automatically parses the results into HTML. That should be the end > goal. but > in terms of engineering support we can get most of the way there > simply by > ensuring we always have a recent snapshot to hand. I can probably > put > something together over the sampled logs and throw it in SQL if > there are > urgent needs. > >> >> Do we not have more recent data than May? > > > We don't, but thanks to the utilities library I built, the code for > generating it would literally run: > > library(WMUtils) > uas <- > > as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent)) > > uas <- uas[,j = list(requests = .N, by = c("os","browser")] > > write.table(uas, file = uas_for_jon.tsv, sep = "\t", row.names = > FALSE, > quote = TRUE) > > ...assuming we didn't care about readability. > > Point is, in the time until we have the new parser built into Hadoop > and > that setup, we can totally generate interim data from the sampled > logs using > the same parser at a tiny cost in research/programming time, iff > (the > mathematical if) we need it enough that we're cool with the > sampling, and > people can convince [[Dario|Our Great Leader]] to authorise me to > spend 15 > minutes of my time on it. > >> >> >> On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes >> okeyes@wikimedia.org >> wrote: >>> >>> Email Dario and I, if he prioritises it I'll run a check on more >>> recent data. >>> >>> At some point I believe we hope to just, you know. Have a >>> regularly >>> updated browser matrix somewhere. This comes some time after >>> pageviews >>> though. >>> >>> On 10 October 2014 14:38, Toby Negrin tnegrin@wikimedia.org >>> wrote: >>>> >>>> Hi Jon -- I'm sure other folks will have more information but >>>> here's >>>> a link to a slide with some data from May[1]. We don't see a lot >>>> of Windows >>>> phone traffic. >>>> >>>> -Toby >>>> >>>> [1] >>>> >>>> https://docs.google.com/a/wikimedia.org/presentation/d/19tZgTi6VUG04wfGWVzca... >>>> >>>> On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson >>>> jrobson@wikimedia.org >>>> wrote: >>>>> >>>>> I was going through our backlog again today, and I noticed a bug >>>>> about >>>>> supporting editing on Windows Phones with IE9 [1] >>>>> >>>>> Yet again, I wondered 'how many of our users are using IE9' as I >>>>> wondered if because of this lack of support we are losing out on >>>>> lots >>>>> of potential editors. >>>>> >>>>> What's the easiest way to get this information now? Is it >>>>> available? >>>>> >>>>> [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=55599 >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation
-- Oliver Keyes Research Analyst Wikimedia Foundation