The threshold is actually at 0.1%, though you are right that this is fairly arbitrary. We have sanitizing data on our goals next quarter, and that's when we'll take a more mathematical approach at the problem.
Original Message From: Christian Schaller Sent: Thursday, March 16, 2017 08:44 To: Dan Andreescu Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; Tomas Popela Subject: Re: [Analytics] Os stats
Been thinking a bit about this and while I do appreciate the privacy concerns I would assume that even if you set the threshold to 0.5% the amount of traffic on Wikipedia would still be great enough for that to not be a real privacy risk? It is just that wikimedia is one of the few open sources with a huge traffic base for this kind of information and we would love to use it as a neutral way to track our own userbase growth in comparison with the wider market. So we know from our internal statistics that we more than doubled our userbase over the last year, but having a resource like wikimedia would allow us to see how those numbers play out in the bigger picture. So any chance of convincing you to lower the threshold to 0.5% to hopefully allow us to start using the statistics already now?
Sincerely, Christian F.K. Schaller Manager for Fedora & Red Hat Desktop efforts
----- Original Message -----
From: "Dan Andreescu" dandreescu@wikimedia.org To: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." analytics@lists.wikimedia.org Cc: "Christian Schaller" cschalle@redhat.com, "Tomas Popela" tpopela@redhat.com Sent: Tuesday, March 14, 2017 2:10:38 PM Subject: Re: [Analytics] Os stats
Christian,
I wanted to make sure our code is working well so I took a look. We use UA Parser, a regex-based community-maintained user agent identifier. It correctly identified Fedora as the OS in all of the strings I found like '%Fedora%' for the hour of raw webrequests I looked at. However, there were less than 0.1% requests that were identified as Fedora. We cut off reporting statistics when numbers get that low for privacy reasons. But everything is detected correctly, so if Fedora's share of requests increases, it will show up on the charts.
Hope this helps.
On Tue, Mar 14, 2017 at 1:51 PM, Erik Zachte ezachte@wikimedia.org wrote:
Hi Christian,
I'm forwarding your question to the WMF Analytics Team who authored this report.
Cheers, Erik
-----Original Message----- From: Christian Schaller [mailto:cschalle@redhat.com] Sent: Monday, March 13, 2017 16:07 To: Erik Zachte Cc: Tomas Popela Subject: Re: Os stats
Hi Erik, Thanks for getting the new OS stats up on: https://analytics.wikimedia.org/dashboards/browsers/#all- sites-by-os/os-family-timeseries
That said as far as we can tell the detection of Fedora does not work at all currently and we can not figure out why. Ubuntu which is detected uses the following user agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
While Fedora which isn't detected uses this user agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Would you be so kind to let us know what the wikimedia analytics engine uses to try to identify Fedora systems? We can tweak our user agents quite easily if that is easier than updating the analytics engines way of detecting Fedora.
Sincerely, Christian F.K. Schaller
----- Original Message -----
From: "Erik Zachte" ezachte@wikimedia.org To: "Christian Schaller" cschalle@redhat.com Sent: Tuesday, October 6, 2015 11:28:55 AM Subject: RE: Os stats
Hi Christian,
Sorry since my previous response we put the reports on hold, as there are issues with reliability now that we migrated https almost fully.
Can you please add your signature to https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Futu re_per_report_B2 I can do it for you, but I don't know: can I add your full name or do you have a Wikipedia nick name that you prefer to use?
We are working on migration of the reports. More here: https://phabricator.wikimedia.org/T114379
Cheers, Erik
-----Original Message----- From: Christian Schaller [mailto:cschalle@redhat.com] Sent: Tuesday, October 06, 2015 16:16 To: Erik Zachte Subject: Re: Os stats
Hi Erik, Just checking what the current plans are for the OS statistics on the wikimedia site. As I mentioned in my first email to you, we would love to use these numbers as a way to estimate how we are doing with Fedora Linux as they are one of the few sources for such statistics where we can be fairly sure the data is not biased one way or the other (due to the huge number of people using wikipedia). Of course with the old stats being discontinued I am know waiting for the new data to be made available to start building my usage trend statistics :)
So on the page it says to let us know if we want a specific report kept, so I would like to repeat my wish that there is a version of report '2' kept available.
Anyway, I realize that maintaining these website statistics is a bit of a sideshow for you guys and not a core part of what your doing, so I just want to say that I do truly appreciate the effort to try to have something at all available.
Sincerely, Christian Schaller
----- Original Message -----
From: "Erik Zachte" ezachte@wikimedia.org To: "Christian Schaller" cschalle@redhat.com Sent: Monday, June 22, 2015 10:41:40 AM Subject: RE: Os stats
Hi Christian,
I started a job to catch-up for the last 3 months, will take 4-5 days.
FYI these reports are almost end-of-life. Expect a complete overhaul of Wikimedia traffic and core metrics reporting based on bigger iron and new paradigms (e.g. hadoop) in 2015 Q3/A4.
Cheers, Erik
-----Original Message----- From: Christian Schaller [mailto:cschalle@redhat.com] Sent: Tuesday, June 16, 2015 16:46 To: ezachte@wikimedia.org Subject: Os stats
Hi Erik, Been checking out the stats on https://stats.wikimedia.org/wikimedia/squids/
SquidReportOperatingSystems.htm.
Are you planning on updating that page again soon? We are using your numbers as one of the datapoints for estimating how Fedora Linux is doing, so I hope you plan on pulling new numbers from time to time.
Christian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Small correction, threshold of browser reporting is 0.05%: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/browser/ge... Even for our traffic below that number reporting is really not that meaningful. Now because the way that grouping happens if 'Fedora 23' and 'Fedora 24' (imaginary versions) have 0.025% traffic neither will get reported. This is something we would like to improve and we have a ticket for it here: https://phabricator.wikimedia.org/T131127 (feel free to chime in)
Now, even with big traffic like ours there is a threshold below which reporting data is not meaningful as numbers in some instances oscillate a lot and that means that there is more noise than signal, we will try to get an specific "desktop" tab (so only requests to desktop site are counted) but even then, Fedora traffic might be too small to display.
On Thu, Mar 16, 2017 at 6:09 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
The threshold is actually at 0.1%, though you are right that this is fairly arbitrary. We have sanitizing data on our goals next quarter, and that's when we'll take a more mathematical approach at the problem.
Original Message From: Christian Schaller Sent: Thursday, March 16, 2017 08:44 To: Dan Andreescu Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; Tomas Popela Subject: Re: [Analytics] Os stats
Been thinking a bit about this and while I do appreciate the privacy concerns I would assume that even if you set the threshold to 0.5% the amount of traffic on Wikipedia would still be great enough for that to not be a real privacy risk? It is just that wikimedia is one of the few open sources with a huge traffic base for this kind of information and we would love to use it as a neutral way to track our own userbase growth in comparison with the wider market. So we know from our internal statistics that we more than doubled our userbase over the last year, but having a resource like wikimedia would allow us to see how those numbers play out in the bigger picture. So any chance of convincing you to lower the threshold to 0.5% to hopefully allow us to start using the statistics already now?
Sincerely, Christian F.K. Schaller Manager for Fedora & Red Hat Desktop efforts
----- Original Message -----
From: "Dan Andreescu" dandreescu@wikimedia.org To: "A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics."
analytics@lists.wikimedia.org Cc: "Christian Schaller" cschalle@redhat.com, "Tomas Popela" <
tpopela@redhat.com>
Sent: Tuesday, March 14, 2017 2:10:38 PM Subject: Re: [Analytics] Os stats
Christian,
I wanted to make sure our code is working well so I took a look. We use
UA
Parser, a regex-based community-maintained user agent identifier. It correctly identified Fedora as the OS in all of the strings I found like '%Fedora%' for the hour of raw webrequests I looked at. However, there were less than 0.1% requests that were identified as Fedora. We cut off reporting statistics when numbers get that low for privacy reasons. But everything is detected correctly, so if Fedora's share of requests increases, it will show up on the charts.
Hope this helps.
On Tue, Mar 14, 2017 at 1:51 PM, Erik Zachte ezachte@wikimedia.org
wrote:
Hi Christian,
I'm forwarding your question to the WMF Analytics Team who authored
this
report.
Cheers, Erik
-----Original Message----- From: Christian Schaller [mailto:cschalle@redhat.com] Sent: Monday, March 13, 2017 16:07 To: Erik Zachte Cc: Tomas Popela Subject: Re: Os stats
Hi Erik, Thanks for getting the new OS stats up on: https://analytics.wikimedia.org/dashboards/browsers/#all- sites-by-os/os-family-timeseries
That said as far as we can tell the detection of Fedora does not work
at
all currently and we can not figure out why. Ubuntu which is detected
uses
the following user agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
While Fedora which isn't detected uses this user agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Would you be so kind to let us know what the wikimedia analytics engine uses to try to identify Fedora systems? We can tweak our user agents
quite
easily if that is easier than updating the analytics engines way of detecting Fedora.
Sincerely, Christian F.K. Schaller
----- Original Message -----
From: "Erik Zachte" ezachte@wikimedia.org To: "Christian Schaller" cschalle@redhat.com Sent: Tuesday, October 6, 2015 11:28:55 AM Subject: RE: Os stats
Hi Christian,
Sorry since my previous response we put the reports on hold, as there are issues with reliability now that we migrated https almost fully.
Can you please add your signature to https://www.mediawiki.org/wiki/Analytics/Wikistats/
TrafficReports/Futu
re_per_report_B2 I can do it for you, but I don't know: can I add
your
full name or do you have a Wikipedia nick name that you prefer to
use?
We are working on migration of the reports. More here: https://phabricator.wikimedia.org/T114379
Cheers, Erik
-----Original Message----- From: Christian Schaller [mailto:cschalle@redhat.com] Sent: Tuesday, October 06, 2015 16:16 To: Erik Zachte Subject: Re: Os stats
Hi Erik, Just checking what the current plans are for the OS statistics on the wikimedia site. As I mentioned in my first email to you, we would
love
to use these numbers as a way to estimate how we are doing with
Fedora
Linux as they are one of the few sources for such statistics where we can be fairly sure the data is not biased one way or the other (due
to
the huge number of people using wikipedia). Of course with the old stats being discontinued I am know waiting for the new data to be
made
available to start building my usage trend statistics :)
So on the page it says to let us know if we want a specific report kept, so I would like to repeat my wish that there is a version of report '2' kept available.
Anyway, I realize that maintaining these website statistics is a bit of a sideshow for you guys and not a core part of what your doing, so I just want to say that I do truly appreciate the effort to try to have something at all available.
Sincerely, Christian Schaller
----- Original Message -----
From: "Erik Zachte" ezachte@wikimedia.org To: "Christian Schaller" cschalle@redhat.com Sent: Monday, June 22, 2015 10:41:40 AM Subject: RE: Os stats
Hi Christian,
I started a job to catch-up for the last 3 months, will take 4-5
days.
FYI these reports are almost end-of-life. Expect a complete
overhaul
of Wikimedia traffic and core metrics reporting based on bigger
iron
and new paradigms (e.g. hadoop) in 2015 Q3/A4.
Cheers, Erik
-----Original Message----- From: Christian Schaller [mailto:cschalle@redhat.com] Sent: Tuesday, June 16, 2015 16:46 To: ezachte@wikimedia.org Subject: Os stats
Hi Erik, Been checking out the stats on https://stats.wikimedia.org/wikimedia/squids/
SquidReportOperatingSystems.htm.
Are you planning on updating that page again soon? We are using your numbers as one of the datapoints for estimating how Fedora Linux is doing, so I hope you plan on pulling new
numbers
from time to time.
Christian
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics