From: Nuria Ruiz Sent: Friday, March 17, 2017 10:57 To: Christian Schaller Reply To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; Tomas Popela Subject: Re: [Analytics] Os stats |
----- Original Message -----
> From: "Nuria Ruiz" <nuria@wikimedia.org>
> To: "Christian Schaller" <cschalle@redhat.com>
> Cc: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics."
> <analytics@lists.wikimedia.org>, "Tomas Popela" <tpopela@redhat.com> > >Hmm, does not make sense to me that the traffic caused by our users would
> Sent: Thursday, March 16, 2017 4:11:54 PM
> Subject: Re: [Analytics] Os stats
>
> be that small,
> Overall? I disagree, I think it does, think that wikipedia (our main source
> of traffic for all wikimedia projects) is fastly moving to mobile, thus
> mobile OS are the bulk of the requests, desktop are the minority and, in
> that minority, Linux is the minority.
Sorry I did not mean to imply that gathering the mobile statistics isn't useful
for Wikipedia, I was just saying that grouping them with desktop data drownes out
desktop data for smaller outfits like ourselves, making the data less useful for
us. That said I do appreciate that you share this data as a public service and have
no obligation to do so, so to be 100% clear; regardless of immediate usefulness to
me I am grateful for the effort you guys are doing. So thank you :)
> Just looked at December 2016 overall pageviews for desktop and mobile
> coming from "users" (not self-identified-bots) and for that month about 20%
> of pageviews are on iOS, 25% are on Android and Fedora is 0.027%. This data
> is counting all projects for the whole world at large, probably Fedora
> represents a larger chuck of traffic in US-desktop only traffic.
>
> I think we are going to be adding a bit more info to our browser reports
> with desktop-only data but still, Fedora traffic is probably not going to
> display.
>
> >Anyway, I will install the analytics stuff myself on a local machine and
> do some testing, to see if I
> >can see a reason for things to fail register properly.
>
> If you end up committing any fix to ua-parser please let us know
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Mar 16, 2017 at 9:28 AM, Christian Schaller <cschalle@redhat.com>
> wrote:
>
> > Hmm, does not make sense to me that the traffic caused by our users would
> > be that small,
> > and there is no version string for Fedora in the user agent, it is just:
> > Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like
> > Gecko) Chrome/56.0.2924.87 Safari/537.36
> >
> > Anyway, I will install the analytics stuff myself on a local machine and
> > do some testing, to see if I
> > can see a reason for things to fail register properly. Thanks for the
> > quick and helpful answers so far.
> >
> > Christian
> >
> >
> >
> > ----- Original Message -----
> > > From: "Nuria Ruiz" <nuria@wikimedia.org>
> > > To: "A mailing list for the Analytics Team at WMF and everybody who has
> > an interest in Wikipedia and analytics."
> > > <analytics@lists.wikimedia.org>
> > > Cc: "Christian Schaller" <cschalle@redhat.com>, "Tomas Popela" <
> > tpopela@redhat.com>
> > > Sent: Thursday, March 16, 2017 12:12:28 PM
> > > Subject: Re: [Analytics] Os stats
> > >
> > > Small correction, threshold of browser reporting is 0.05%:
> > > https://github.com/wikimedia/analytics-refinery/blob/
> > master/oozie/browser/general/coordinator.properties#L62
> > > Even for our traffic below that number reporting is really not that
> > > meaningful. Now because the way that grouping happens if 'Fedora 23' and
> > > 'Fedora 24' (imaginary versions) have 0.025% traffic neither will get
> > > reported. This is something we would like to improve and we have a ticket
> > > for it here: https://phabricator.wikimedia.org/T131127 (feel free to
> > chime
> > > in)
> > >
> > > Now, even with big traffic like ours there is a threshold below which
> > > reporting data is not meaningful as numbers in some instances oscillate a
> > > lot and that means that there is more noise than signal, we will try to
> > get
> > > an specific "desktop" tab (so only requests to desktop site are counted)
> > > but even then, Fedora traffic might be too small to display.
> > >
> > > On Thu, Mar 16, 2017 at 6:09 AM, Dan Andreescu <dandreescu@wikimedia.org
> > >
> > > wrote:
> > >
> > > > The threshold is actually at 0.1%, though you are right that this is
> > > > fairly arbitrary. We have sanitizing data on our goals next quarter,
> > and
> > > > that's when we'll take a more mathematical approach at the problem.
> > > >
> > > > Original Message
> > > > From: Christian Schaller
> > > > Sent: Thursday, March 16, 2017 08:44
> > > > To: Dan Andreescu
> > > > Cc: A mailing list for the Analytics Team at WMF and everybody who has
> > an
> > > > interest in Wikipedia and analytics.; Tomas Popela
> > > > Subject: Re: [Analytics] Os stats
> > > >
> > > > Been thinking a bit about this and while I do appreciate the privacy
> > > > concerns I would assume that
> > > > even if you set the threshold to 0.5% the amount of traffic on
> > Wikipedia
> > > > would still be great enough
> > > > for that to not be a real privacy risk? It is just that wikimedia is
> > one
> > > > of the few open sources with
> > > > a huge traffic base for this kind of information and we would love to
> > use
> > > > it as a neutral way to track
> > > > our own userbase growth in comparison with the wider market. So we know
> > > > from our internal statistics that we
> > > > more than doubled our userbase over the last year, but having a
> > resource
> > > > like wikimedia would allow us to see
> > > > how those numbers play out in the bigger picture. So any chance of
> > > > convincing you to lower the threshold
> > > > to 0.5% to hopefully allow us to start using the statistics already
> > now?
> > > >
> > > > Sincerely,
> > > > Christian F.K. Schaller
> > > > Manager for Fedora & Red Hat Desktop efforts
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Dan Andreescu" <dandreescu@wikimedia.org>
> > > > > To: "A mailing list for the Analytics Team at WMF and everybody who
> > has
> > > > an interest in Wikipedia and analytics."
> > > > > <analytics@lists.wikimedia.org>
> > > > > Cc: "Christian Schaller" <cschalle@redhat.com>, "Tomas Popela" <
> > > > tpopela@redhat.com>
> > > > > Sent: Tuesday, March 14, 2017 2:10:38 PM
> > > > > Subject: Re: [Analytics] Os stats
> > > > >
> > > > > Christian,
> > > > >
> > > > > I wanted to make sure our code is working well so I took a look. We
> > use
> > > > UA
> > > > > Parser, a regex-based community-maintained user agent identifier. It
> > > > > correctly identified Fedora as the OS in all of the strings I found
> > like
> > > > > '%Fedora%' for the hour of raw webrequests I looked at. However,
> > there
> > > > > were less than 0.1% requests that were identified as Fedora. We cut
> > off
> > > > > reporting statistics when numbers get that low for privacy reasons.
> > But
> > > > > everything is detected correctly, so if Fedora's share of requests
> > > > > increases, it will show up on the charts.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > On Tue, Mar 14, 2017 at 1:51 PM, Erik Zachte <ezachte@wikimedia.org>
> > > > wrote:
> > > > >
> > > > > > Hi Christian,
> > > > > >
> > > > > > I'm forwarding your question to the WMF Analytics Team who authored
> > > > this
> > > > > > report.
> > > > > >
> > > > > > Cheers,
> > > > > > Erik
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Christian Schaller [mailto:cschalle@redhat.com]
> > > > > > Sent: Monday, March 13, 2017 16:07
> > > > > > To: Erik Zachte
> > > > > > Cc: Tomas Popela
> > > > > > Subject: Re: Os stats
> > > > > >
> > > > > > Hi Erik,
> > > > > > Thanks for getting the new OS stats up on:
> > > > > > https://analytics.wikimedia.org/dashboards/browsers/#all-
> > > > > > sites-by-os/os-family-timeseries
> > > > > >
> > > > > > That said as far as we can tell the detection of Fedora does not
> > work
> > > > at
> > > > > > all currently and we can not figure out why. Ubuntu which is
> > detected
> > > > uses
> > > > > > the following user agent:
> > > > > > Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101
> > > > > > Firefox/52.0
> > > > > >
> > > > > > While Fedora which isn't detected uses this user agent:
> > > > > > Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101
> > > > > > Firefox/52.0
> > > > > >
> > > > > > Would you be so kind to let us know what the wikimedia analytics
> > engine
> > > > > > uses to try to identify Fedora systems? We can tweak our user
> > agents
> > > > quite
> > > > > > easily if that is easier than updating the analytics engines way of
> > > > > > detecting Fedora.
> > > > > >
> > > > > > Sincerely,
> > > > > > Christian F.K. Schaller
> > > > > >
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > From: "Erik Zachte" <ezachte@wikimedia.org>
> > > > > > > To: "Christian Schaller" <cschalle@redhat.com>
> > > > > > > Sent: Tuesday, October 6, 2015 11:28:55 AM
> > > > > > > Subject: RE: Os stats
> > > > > > >
> > > > > > > Hi Christian,
> > > > > > >
> > > > > > > Sorry since my previous response we put the reports on hold, as
> > there
> > > > > > > are issues with reliability now that we migrated https almost
> > fully.
> > > > > > >
> > > > > > > Can you please add your signature to
> > > > > > > https://www.mediawiki.org/wiki/Analytics/Wikistats/
> > > > TrafficReports/Futu
> > > > > > > re_per_report_B2 I can do it for you, but I don't know: can I add
> > > > your
> > > > > > > full name or do you have a Wikipedia nick name that you prefer to
> > > > use?
> > > > > > >
> > > > > > > We are working on migration of the reports. More here:
> > > > > > > https://phabricator.wikimedia.org/T114379
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Erik
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Christian Schaller [mailto:cschalle@redhat.com]
> > > > > > > Sent: Tuesday, October 06, 2015 16:16
> > > > > > > To: Erik Zachte
> > > > > > > Subject: Re: Os stats
> > > > > > >
> > > > > > > Hi Erik,
> > > > > > > Just checking what the current plans are for the OS statistics
> > on the
> > > > > > > wikimedia site. As I mentioned in my first email to you, we would
> > > > love
> > > > > > > to use these numbers as a way to estimate how we are doing with
> > > > Fedora
> > > > > > > Linux as they are one of the few sources for such statistics
> > where we
> > > > > > > can be fairly sure the data is not biased one way or the other
> > (due
> > > > to
> > > > > > > the huge number of people using wikipedia). Of course with the
> > old
> > > > > > > stats being discontinued I am know waiting for the new data to be
> > > > made
> > > > > > > available to start building my usage trend statistics :)
> > > > > > >
> > > > > > > So on the page it says to let us know if we want a specific
> > report
> > > > > > > kept, so I would like to repeat my wish that there is a version
> > of
> > > > > > > report '2' kept available.
> > > > > > >
> > > > > > > Anyway, I realize that maintaining these website statistics is a
> > bit
> > > > > > > of a sideshow for you guys and not a core part of what your
> > doing, so
> > > > > > > I just want to say that I do truly appreciate the effort to try
> > to
> > > > > > > have something at all available.
> > > > > > >
> > > > > > > Sincerely,
> > > > > > > Christian Schaller
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > > From: "Erik Zachte" <ezachte@wikimedia.org>
> > > > > > > > To: "Christian Schaller" <cschalle@redhat.com>
> > > > > > > > Sent: Monday, June 22, 2015 10:41:40 AM
> > > > > > > > Subject: RE: Os stats
> > > > > > > >
> > > > > > > > Hi Christian,
> > > > > > > >
> > > > > > > > I started a job to catch-up for the last 3 months, will take
> > 4-5
> > > > days.
> > > > > > > >
> > > > > > > > FYI these reports are almost end-of-life. Expect a complete
> > > > overhaul
> > > > > > > > of Wikimedia traffic and core metrics reporting based on bigger
> > > > iron
> > > > > > > > and new paradigms (e.g. hadoop) in 2015 Q3/A4.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Erik
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Christian Schaller [mailto:cschalle@redhat.com]
> > > > > > > > Sent: Tuesday, June 16, 2015 16:46
> > > > > > > > To: ezachte@wikimedia.org
> > > > > > > > Subject: Os stats
> > > > > > > >
> > > > > > > > Hi Erik,
> > > > > > > > Been checking out the stats on
> > > > > > > > https://stats.wikimedia.org/wikimedia/squids/
> > > > > > SquidReportOperatingSystems.htm.
> > > > > > > > Are you planning on updating that page again soon?
> > > > > > > > We are using your numbers as one of the datapoints for
> > estimating
> > > > > > > > how Fedora Linux is doing, so I hope you plan on pulling new
> > > > numbers
> > > > > > > > from time to time.
> > > > > > > >
> > > > > > > > Christian
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Analytics mailing list
> > > > > > Analytics@lists.wikimedia.org
> > > > > > https://lists.wikimedia.org/mailman/listinfo/analytics
> > > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Analytics mailing list
> > > > Analytics@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/analytics
> > > >
> > >
> >
>