On Wed, Oct 15, 2014 at 12:12 PM, Andrew Otto <aotto@wikimedia.org> wrote:
> Jon,
>
> Recent unsampled webrequest logs are available for querying in Hive now!
>
> https://wikitech.wikimedia.org/wiki/Analytics/Cluster
>
> :)
>
> If you don’t already have access for this, submit an RT request to get access to stat1002 and the analytics-privatedata-group.
>
That's good to know. Thanks. I'm not sure if I have stat1002 access
but every time you mention RT I shudder ;-)
Thanks for the dump of data Nuria. I assume these all add up to 100%
(roughly) and are global? So if I understand correctly, if I get the
above access and follow your instructions I can get this data when I
do need it until we have some nice page I can go to to retrieve it :).
This is good to know when we have these sort of questions so thanks a
bunch. We are currently interested in phablet traffic (big screen
mobile devices) so this should be useful information for us thanks!
On Thu, Oct 16, 2014 at 7:15 PM, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>And I have no idea what our traffic for
>>Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of
>>our traffic.
> So the answer to this question (with preliminary data) is that neither 2.1
> nor 2.2 amount to 0.05% of traffic to the mobile site.
>
> I have attached the list of user agents and devices (with percentages) for
> the last 30 days. I did not included any device/browser combo with less than
> 0.05% of traffic.
>
> For about 4% of traffic we could not identify the browser, this might be
> cause the user agent was not there or because ua-parser could not figure it
> out, I understand this is not ideal but I am sending this cause I feel this
> list provides quite a bit of value and should help you triage bugs.
>
> iOS takes the cake which does not cease to amaze me.
>
> I described what I did to gather the data here (anyone with permits to 1002
> can repro):
> https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF
>
>
> On Wed, Oct 15, 2014 at 12:15 PM, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>
>> >And I have no idea what our traffic for
>> >Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of
>> >our traffic.
>> Understood, it is hard for you guys to work without knowing this data. I
>> will try to get a user agent list for data from last month but, as I
>> mentioned earlier, I think providing this data in a regular basis (monthly?)
>> is a good goal for us.
>>
>> On Wed, Oct 15, 2014 at 10:35 AM, Jon Robson <jrobson@wikimedia.org>
>> wrote:
>>>
>>> Anything would be useful. I just hit this situation again. I was
>>> reviewing some code and someone used JSON.stringify - this is not
>>> available in Android < 2.3 and I have no idea what our traffic for
>>> Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of
>>> our traffic.
>>>
>>> In the mean time while I don't have a fancy place to find out the
>>> answers to this how can I get these answers?
>>> Should I mail the analytics mailing list to ask these questions? Cc a
>>> point person on bugzilla with the question? Ping someone privately?
>>>
>>> Jon
>>>
>>>
>>>
>>> On Tue, Oct 14, 2014 at 10:30 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>> >>Woah! Nice :D How are definitions updates handled?
>>> > Since we talked about this on IRC, restating here to keep the archives
>>> > happy.
>>> > We pull the ua parser jar from our archiva depot, an update will
>>> > involve
>>> > building a new jar, uploading it to archiva and updating our dependency
>>> > file
>>> > (pom.xml) to point to the newly updated version.
>>> >
>>> >
>>> >
>>> > On Fri, Oct 10, 2014 at 9:59 PM, Oliver Keyes <okeyes@wikimedia.org>
>>> > wrote:
>>> >>
>>> >> Woah! Nice :D How are definitions updates handled?
>>> >>
>>> >> On 10 October 2014 18:58, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>> >>>
>>> >>> >1. A UDF for ua-parser or whatever we decide to use (this will
>>> >>> > possibly
>>> >>> > be necessary for pageviews, but not necessarily - it depends on our
>>> >>> > >spider/automaton detection strategy)
>>> >>> We got this one ready today:
>>> >>> https://gerrit.wikimedia.org/r/#/c/166142/
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Fri, Oct 10, 2014 at 3:55 PM, Oliver Keyes <okeyes@wikimedia.org>
>>> >>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 10 October 2014 16:02, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>> >>>>>
>>> >>>>> >At some point I believe we hope to just, you know. Have a
>>> >>>>> > regularly
>>> >>>>> > updated browser matrix somewhere.
>>> >>>>> I REALLY think this should make it into our goals, if it cannot be
>>> >>>>> done
>>> >>>>> this quarter it should for sure be done this quarter.
>>> >>>>>
>>> >>>>
>>> >>>> I agree it would be nice. It's one of those things that will either
>>> >>>> come
>>> >>>> as a side-effect of other stuff, OR require subsantially more work,
>>> >>>> and
>>> >>>> nothing in-between. Things we need for it:
>>> >>>>
>>> >>>> 1. A UDF for ua-parser or whatever we decide to use (this will
>>> >>>> possibly
>>> >>>> be necessary for pageviews, but not necessarily - it depends on our
>>> >>>> spider/automaton detection strategy)
>>> >>>> 2. Pageviews data
>>> >>>> 3. A table somewhere.
>>> >>>>
>>> >>>> Take 1, apply to 2, stick in 3. Maybe grab the same data for
>>> >>>> text/html
>>> >>>> requests overall (depends on query runtime), maybe don't.
>>> >>>>
>>> >>>> The ideal implementation, obviously, is to pair this up with a site
>>> >>>> that
>>> >>>> automatically parses the results into HTML. That should be the end
>>> >>>> goal. but
>>> >>>> in terms of engineering support we can get most of the way there
>>> >>>> simply by
>>> >>>> ensuring we always have a recent snapshot to hand. I can probably
>>> >>>> put
>>> >>>> something together over the sampled logs and throw it in SQL if
>>> >>>> there are
>>> >>>> urgent needs.
>>> >>>>
>>> >>>>>
>>> >>>>> Do we not have more recent data than May?
>>> >>>>
>>> >>>>
>>> >>>> We don't, but thanks to the utilities library I built, the code for
>>> >>>> generating it would literally run:
>>> >>>>
>>> >>>> library(WMUtils)
>>> >>>> uas <-
>>> >>>>
>>> >>>> as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent))
>>> >>>>
>>> >>>> uas <- uas[,j = list(requests = .N, by = c("os","browser")]
>>> >>>>
>>> >>>> write.table(uas, file = uas_for_jon.tsv, sep = "\t", row.names =
>>> >>>> FALSE,
>>> >>>> quote = TRUE)
>>> >>>>
>>> >>>> ...assuming we didn't care about readability.
>>> >>>>
>>> >>>> Point is, in the time until we have the new parser built into Hadoop
>>> >>>> and
>>> >>>> that setup, we can totally generate interim data from the sampled
>>> >>>> logs using
>>> >>>> the same parser at a tiny cost in research/programming time, iff
>>> >>>> (the
>>> >>>> mathematical if) we need it enough that we're cool with the
>>> >>>> sampling, and
>>> >>>> people can convince [[Dario|Our Great Leader]] to authorise me to
>>> >>>> spend 15
>>> >>>> minutes of my time on it.
>>> >>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes
>>> >>>>> <okeyes@wikimedia.org>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Email Dario and I, if he prioritises it I'll run a check on more
>>> >>>>>> recent data.
>>> >>>>>>
>>> >>>>>> At some point I believe we hope to just, you know. Have a
>>> >>>>>> regularly
>>> >>>>>> updated browser matrix somewhere. This comes some time after
>>> >>>>>> pageviews
>>> >>>>>> though.
>>> >>>>>>
>>> >>>>>> On 10 October 2014 14:38, Toby Negrin <tnegrin@wikimedia.org>
>>> >>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Hi Jon -- I'm sure other folks will have more information but
>>> >>>>>>> here's
>>> >>>>>>> a link to a slide with some data from May[1]. We don't see a lot
>>> >>>>>>> of Windows
>>> >>>>>>> phone traffic.
>>> >>>>>>>
>>> >>>>>>> -Toby
>>> >>>>>>>
>>> >>>>>>> [1]
>>> >>>>>>>
>>> >>>>>>> https://docs.google.com/a/wikimedia.org/presentation/d/19tZgTi6VUG04wfGWVzcaZKY26oQiXhPaHI9g2tBmMKE/edit#slide=id.g382406373_08
>>> >>>>>>>
>>> >>>>>>> On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson
>>> >>>>>>> <jrobson@wikimedia.org>
>>> >>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> I was going through our backlog again today, and I noticed a bug
>>> >>>>>>>> about
>>> >>>>>>>> supporting editing on Windows Phones with IE9 [1]
>>> >>>>>>>>
>>> >>>>>>>> Yet again, I wondered 'how many of our users are using IE9' as I
>>> >>>>>>>> wondered if because of this lack of support we are losing out on
>>> >>>>>>>> lots
>>> >>>>>>>> of potential editors.
>>> >>>>>>>>
>>> >>>>>>>> What's the easiest way to get this information now? Is it
>>> >>>>>>>> available?
>>> >>>>>>>>
>>> >>>>>>>> [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=55599
>>> >>>>>>>>
>>> >>>>>>>> _______________________________________________
>>> >>>>>>>> Analytics mailing list
>>> >>>>>>>> Analytics@lists.wikimedia.org
>>> >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Oliver Keyes
>>> >>>>>> Research Analyst
>>> >>>>>> Wikimedia Foundation
>>> >>>>>>
>>> >>>>>> _______________________________________________
>>> >>>>>> Analytics mailing list
>>> >>>>>> Analytics@lists.wikimedia.org
>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Oliver Keyes
>>> >>>> Research Analyst
>>> >>>> Wikimedia Foundation
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Oliver Keyes
>>> >> Research Analyst
>>> >> Wikimedia Foundation
>>> >
>>> >
>>
>>
>