Hi all,
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
- These reports are using outdated definitions for page views.
- The scripts haven't seen any maintenance for years.
Even with the new pageview API still in development more and more these reports are misreporting reality anyway.
There was a period were I felt imperfect reports were better than no reports at all, and I warned about unresolved bugs in the report header.
But the anomaly reported below served as a wake-up-call for me that mismatches are intolerably high anyway.
So I propose to put up a notice on the latest reports that those were the last release, and WMF is working to deliver a new infrastructure in the form of a pageview API, ETA later this year.
See also https://phabricator.wikimedia.org/T44259
Whether WMF will also assume responsibility for building new reports on top of that API (and if so in what form) is another matter, but first things first. Current focus is on providing that API, as it should be IMO.
Any thoughts?
Erik Zachte
From: Erik Zachte [mailto:erikzachte@infodisiac.com] Sent: Friday, July 24, 2015 17:58 To: 'Андрей Лавров' Subject: RE: Wikimedia Traffic Analysis Report - Operating Systems
Hey Andrey,
You're totally right of course. And not the only to notice. These traffic reports haven't seen much (maintenance) love lately. I'm tempted to disable them. I'm looking forward to the upcoming WMF pageview API as much more promising platform to build better reports: more up to date, more robust, more flexible. Of course there is always a hazard to stop maintaining a solution before a replacement is really there, but this is what actually happened long ago.
Thanks for heads-up.
Erik
From: Андрей Лавров [mailto:andrey.lavrov@wancastle.com] Sent: Wednesday, July 22, 2015 11:09 To: erikzachte@infodisiac.com Subject: Wikimedia Traffic Analysis Report - Operating Systems
Dear Erik,
Please, improve your analysis reports by including Chrome OS statistics.
Chrome OS has about 10% market share in US now. Almost all chromebooks are online every day. It is very strange to not see Chrome OS market share in your reports.
Best regards,
Andrey Lavrov
I agree with you Erik, but there are some details that need sorting out.
* The Pageview API that we'll deliver by the end of this quarter will not have the more detailed analysis that your traffic reports have (such as referrer stats, browser stats, etc). We should make a separate effort, probably as part of Wikistats 2.0 to cover those gaps. * The data to power any of these reports is in great shape and is on the hadoop cluster in a neatly pre-aggregated hourly table. We could use that to start replicating these more detailed analyses from Wikistats.
On Fri, Jul 24, 2015 at 12:59 PM, Erik Zachte erikzachte@infodisiac.com wrote:
Hi all,
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
These reports are using outdated definitions for page views.
The scripts haven't seen any maintenance for years.
Even with the new pageview API still in development more and more these reports are misreporting reality anyway.
There was a period were I felt imperfect reports were better than no reports at all, and I warned about unresolved bugs in the report header.
But the anomaly reported below served as a wake-up-call for me that mismatches are intolerably high anyway.
So I propose to put up a notice on the latest reports that those were the last release, and WMF is working to deliver a new infrastructure in the form of a pageview API, ETA later this year.
See also https://phabricator.wikimedia.org/T44259
Whether WMF will also assume responsibility for building new reports on top of that API (and if so in what form) is another matter, but first things first. Current focus is on providing that API, as it should be IMO.
Any thoughts?
Erik Zachte
*From:* Erik Zachte [mailto:erikzachte@infodisiac.com] *Sent:* Friday, July 24, 2015 17:58 *To:* 'Андрей Лавров' *Subject:* RE: Wikimedia Traffic Analysis Report - Operating Systems
Hey Andrey,
You're totally right of course. And not the only to notice. These traffic reports haven't seen much (maintenance) love lately. I'm tempted to disable them. I'm looking forward to the upcoming WMF pageview API as much more promising platform to build better reports: more up to date, more robust, more flexible. Of course there is always a hazard to stop maintaining a solution before a replacement is really there, but this is what actually happened long ago.
Thanks for heads-up.
Erik
*From:* Андрей Лавров [mailto:andrey.lavrov@wancastle.com] *Sent:* Wednesday, July 22, 2015 11:09 *To:* erikzachte@infodisiac.com *Subject:* Wikimedia Traffic Analysis Report - Operating Systems
Dear Erik,
Please, improve your analysis reports by including Chrome OS statistics.
Chrome OS has about 10% market share in US now. Almost all chromebooks are online every day. It is very strange to not see Chrome OS market share in your reports.
Best regards,
Andrey Lavrov
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks for correcting me on this, Dan. The scope of the upcoming API is well defined, and not the cure-all that I make it seem to be in my enthusiasm, sorry for causing confusion.
As you say replacing some of the traffic analyses will be separate task, yet to be defined.
- I say 'some' as not all of the reports have found a true user base
- I reckon the primary delivarable might well change from html to machine readable from which anyone can build nice more dynamic reports
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu Sent: Friday, July 24, 2015 19:08 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] proposal to axe current traffic reports
I agree with you Erik, but there are some details that need sorting out.
* The Pageview API that we'll deliver by the end of this quarter will not have the more detailed analysis that your traffic reports have (such as referrer stats, browser stats, etc). We should make a separate effort, probably as part of Wikistats 2.0 to cover those gaps.
* The data to power any of these reports is in great shape and is on the hadoop cluster in a neatly pre-aggregated hourly table. We could use that to start replicating these more detailed analyses from Wikistats.
On Fri, Jul 24, 2015 at 12:59 PM, Erik Zachte erikzachte@infodisiac.com wrote:
Hi all,
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
- These reports are using outdated definitions for page views.
- The scripts haven't seen any maintenance for years.
Even with the new pageview API still in development more and more these reports are misreporting reality anyway.
There was a period were I felt imperfect reports were better than no reports at all, and I warned about unresolved bugs in the report header.
But the anomaly reported below served as a wake-up-call for me that mismatches are intolerably high anyway.
So I propose to put up a notice on the latest reports that those were the last release, and WMF is working to deliver a new infrastructure in the form of a pageview API, ETA later this year.
See also https://phabricator.wikimedia.org/T44259
Whether WMF will also assume responsibility for building new reports on top of that API (and if so in what form) is another matter, but first things first. Current focus is on providing that API, as it should be IMO.
Any thoughts?
Erik Zachte
From: Erik Zachte [mailto:erikzachte@infodisiac.com] Sent: Friday, July 24, 2015 17:58 To: 'Андрей Лавров' Subject: RE: Wikimedia Traffic Analysis Report - Operating Systems
Hey Andrey,
You're totally right of course. And not the only to notice. These traffic reports haven't seen much (maintenance) love lately. I'm tempted to disable them. I'm looking forward to the upcoming WMF pageview API as much more promising platform to build better reports: more up to date, more robust, more flexible. Of course there is always a hazard to stop maintaining a solution before a replacement is really there, but this is what actually happened long ago.
Thanks for heads-up.
Erik
From: Андрей Лавров [mailto:andrey.lavrov@wancastle.com] Sent: Wednesday, July 22, 2015 11:09 To: erikzachte@infodisiac.com Subject: Wikimedia Traffic Analysis Report - Operating Systems
Dear Erik,
Please, improve your analysis reports by including Chrome OS statistics.
Chrome OS has about 10% market share in US now. Almost all chromebooks are online every day. It is very strange to not see Chrome OS market share in your reports.
Best regards,
Andrey Lavrov
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I would love it if people on this list or elsewhere would start identifying the highest value reports from wikistats. We can also use traffic data to figure out the most popular pages, but this doesn't always mean highest value.
On Fri, Jul 24, 2015 at 1:51 PM, Erik Zachte ezachte@wikimedia.org wrote:
Thanks for correcting me on this, Dan. The scope of the upcoming API is well defined, and not the cure-all that I make it seem to be in my enthusiasm, sorry for causing confusion.
As you say replacing some of the traffic analyses will be separate task, yet to be defined.
I say 'some' as not all of the reports have found a true user base
I reckon the primary delivarable might well change from html to machine
readable from which anyone can build nice more dynamic reports
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Dan Andreescu *Sent:* Friday, July 24, 2015 19:08 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] proposal to axe current traffic reports
I agree with you Erik, but there are some details that need sorting out.
- The Pageview API that we'll deliver by the end of this quarter will not
have the more detailed analysis that your traffic reports have (such as referrer stats, browser stats, etc). We should make a separate effort, probably as part of Wikistats 2.0 to cover those gaps.
- The data to power any of these reports is in great shape and is on the
hadoop cluster in a neatly pre-aggregated hourly table. We could use that to start replicating these more detailed analyses from Wikistats.
On Fri, Jul 24, 2015 at 12:59 PM, Erik Zachte erikzachte@infodisiac.com wrote:
Hi all,
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
These reports are using outdated definitions for page views.
The scripts haven't seen any maintenance for years.
Even with the new pageview API still in development more and more these reports are misreporting reality anyway.
There was a period were I felt imperfect reports were better than no reports at all, and I warned about unresolved bugs in the report header.
But the anomaly reported below served as a wake-up-call for me that mismatches are intolerably high anyway.
So I propose to put up a notice on the latest reports that those were the last release, and WMF is working to deliver a new infrastructure in the form of a pageview API, ETA later this year.
See also https://phabricator.wikimedia.org/T44259
Whether WMF will also assume responsibility for building new reports on top of that API (and if so in what form) is another matter, but first things first. Current focus is on providing that API, as it should be IMO.
Any thoughts?
Erik Zachte
*From:* Erik Zachte [mailto:erikzachte@infodisiac.com] *Sent:* Friday, July 24, 2015 17:58 *To:* 'Андрей Лавров' *Subject:* RE: Wikimedia Traffic Analysis Report - Operating Systems
Hey Andrey,
You're totally right of course. And not the only to notice. These traffic reports haven't seen much (maintenance) love lately. I'm tempted to disable them. I'm looking forward to the upcoming WMF pageview API as much more promising platform to build better reports: more up to date, more robust, more flexible. Of course there is always a hazard to stop maintaining a solution before a replacement is really there, but this is what actually happened long ago.
Thanks for heads-up.
Erik
*From:* Андрей Лавров [mailto:andrey.lavrov@wancastle.com] *Sent:* Wednesday, July 22, 2015 11:09 *To:* erikzachte@infodisiac.com *Subject:* Wikimedia Traffic Analysis Report - Operating Systems
Dear Erik,
Please, improve your analysis reports by including Chrome OS statistics.
Chrome OS has about 10% market share in US now. Almost all chromebooks are online every day. It is very strange to not see Chrome OS market share in your reports.
Best regards,
Andrey Lavrov
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 24 Jul 2015, at 10:52, Dan Andreescu dandreescu@wikimedia.org wrote:
I would love it if people on this list or elsewhere would start identifying the highest value reports from wikistats. We can also use traffic data to figure out the most popular pages, but this doesn't always mean highest value.
The thing I used to use stats.wikimedia.org most often for is browser usage statistics. However this with mixed feelings because I know it's user-agent parsing logic and how it's become dated.
I know we store the UA in Hadoop (right?) and we preprocess it with the ua-parser library (which I trust). We just need a good dashboard to get continuous insight into this data. Is this data presented in one of the current analytics dashboards? [1]
Right now I'm flying blind and this has stagnated many different projects and initiatives because I can't trust the data. Resulting in either questionable decisions, stalled issues, or using third-party data instead.
-- Timo
[1] https://meta.wikimedia.org/wiki/Research:Data/Dashboards https://meta.wikimedia.org/wiki/Research:Data/Dashboards
The thing I used to use stats.wikimedia.org most often for is browser usage statistics. However this with mixed feelings because I know it's user-agent parsing logic and how it's become dated.
I know we store the UA in Hadoop (right?)
Yes, and that's the source we'd use as we update wikistats reports.
and we preprocess it with the ua-parser library (which I trust). We just need a good dashboard to get continuous insight into this data. Is this data presented in one of the current analytics dashboards?
No, so far we've just run one-off jobs to get people answers to specific questions.
Right now I'm flying blind and this has stagnated many different projects and initiatives because I can't trust the data. Resulting in either questionable decisions, stalled issues, or using third-party data instead.
Thanks for pointing out that this is of high value. I have logged this here (the epic we'll use to keep track of new Wikistats work): https://phabricator.wikimedia.org/T107175
+1 to this being of high value!
On 28 July 2015 at 14:00, Dan Andreescu dandreescu@wikimedia.org wrote:
The thing I used to use stats.wikimedia.org most often for is browser usage statistics. However this with mixed feelings because I know it's user-agent parsing logic and how it's become dated.
I know we store the UA in Hadoop (right?)
Yes, and that's the source we'd use as we update wikistats reports.
and we preprocess it with the ua-parser library (which I trust). We just need a good dashboard to get continuous insight into this data. Is this data presented in one of the current analytics dashboards?
No, so far we've just run one-off jobs to get people answers to specific questions.
Right now I'm flying blind and this has stagnated many different projects and initiatives because I can't trust the data. Resulting in either questionable decisions, stalled issues, or using third-party data instead.
Thanks for pointing out that this is of high value. I have logged this here (the epic we'll use to keep track of new Wikistats work): https://phabricator.wikimedia.org/T107175
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Erik Zachte, 24/07/2015 18:59:
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
Only the breakdowns by client? All the breakdowns? All the pageview stats? The country data is very important, for instance: people often ask such numbers (at least in Italy); nobody is ever looking at all of them in detail, so it's important for i18n etc. that they are available for everyone to look at their corner.
Nemo
Nemo, thanks for asking,
Wikistats broadly comes in two parts - A Content and activity reports per wiki (html tables and charts based on the xml dumps) - B Traffic reports
Traffic reports are built from two sources
-- B1 Domas' hourly aggregations per wiki, aggregated further into monthly totals per wiki (mobile/non-mobile, normalized/non-normalized), grouped by project e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
-- B2 Sampled log lines (these days generated via hadoop)
These sampled log lines are used for two types of reports (with some hybrids)
--- B2a Breakdowns of traffic by geographic criteria (country, continent, N/S) http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVi...
--- B2b Breakdowns of traffic by non geographic criteria (os, browser, mime type, target wiki, referer, etc) http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
My current proposal is on disabling B2b and hybrid reports like http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Federico Leva (Nemo) Sent: Friday, July 24, 2015 21:41 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; Dario Taraborelli; 'Kevin Leduc' Subject: Re: [Analytics] proposal to axe current traffic reports
Erik Zachte, 24/07/2015 18:59:
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+o f+traffic
Only the breakdowns by client? All the breakdowns? All the pageview stats? The country data is very important, for instance: people often ask such numbers (at least in Italy); nobody is ever looking at all of them in detail, so it's important for i18n etc. that they are available for everyone to look at their corner.
Nemo
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
A +1 from me, but a regretful one - Erik, you've maintained these things and made them work far past the point where it would've exceeded my skills and endurance. I take my hat off to you, and look forward to recreationally replicating some of the reports once this API comes up :)
On 24 July 2015 at 16:25, Erik Zachte ezachte@wikimedia.org wrote:
Nemo, thanks for asking,
Wikistats broadly comes in two parts
A Content and activity reports per wiki (html tables and charts based on the xml dumps)
B Traffic reports
Traffic reports are built from two sources
-- B1 Domas' hourly aggregations per wiki, aggregated further into monthly totals per wiki (mobile/non-mobile, normalized/non-normalized), grouped by project e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
-- B2 Sampled log lines (these days generated via hadoop)
These sampled log lines are used for two types of reports (with some hybrids) --- B2a Breakdowns of traffic by geographic criteria (country, continent, N/S) http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm --- B2b Breakdowns of traffic by non geographic criteria (os, browser, mime type, target wiki, referer, etc) http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+trafficMy current proposal is on disabling B2b and hybrid reports like http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
Erik
-----Original Message----- From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Federico Leva (Nemo) Sent: Friday, July 24, 2015 21:41 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.; Dario Taraborelli; 'Kevin Leduc' Subject: Re: [Analytics] proposal to axe current traffic reports
Erik Zachte, 24/07/2015 18:59:
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+o f+traffic
Only the breakdowns by client? All the breakdowns? All the pageview stats? The country data is very important, for instance: people often ask such numbers (at least in Italy); nobody is ever looking at all of them in detail, so it's important for i18n etc. that they are available for everyone to look at their corner.
Nemo
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats broadly comes in two parts
- A Content and activity reports per wiki (html tables and charts based on
the xml dumps)
B Traffic reports
Traffic reports are built from two sources
-- B1 Domas' hourly aggregations per wiki, aggregated further into
monthly totals per wiki (mobile/non-mobile, normalized/non-normalized), grouped by project e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
-- B2 Sampled log lines (these days generated via hadoop)
These sampled log lines are used for two types of reports (with somehybrids)
--- B2a Breakdowns of traffic by geographic criteria (country,continent, N/S)
http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVi...
--- B2b Breakdowns of traffic by non geographic criteria (os,browser, mime type, target wiki, referer, etc)
http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
My current proposal is on disabling B2b and hybrid reports like http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
Is there a specific reason for disabling country, mime type etc. reports? User agent sniffing rules require constant updates as new browsers appear, so browser reports become misleading when unmaintained, but I would expect e.g. the target wiki logic to be fairly stable; and country logic (I assume) is maintained externally by MaxMind; are there also known problems with those?
Dan:
I would love it if people on this list or elsewhere would start identifying the highest value reports from wikistats. We can also use traffic data to figure out the most popular pages, but this doesn't always mean highest value.
The traffic data Dan refers to (I assume) is this:
http://stats.wikimedia.org/wikistats-traffic-2015-04.html
Indeed pageviews for each report can be misleading (see e.g. red links to totally outdated reports)
So how to go about this? I made a list of squid based traffic reports (some more to add). Will this work?
Concept pages:
https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future#Future:_general_ideas
https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per...
Jane:
I think we should keep them until we have new ones, because if you axe them now, no one will remember how or why they were built (and you won't be able to point users in the right direction).
Sure, I'm not going to delete the existing reports. I'm merely suggesting not to update some of those, and put a clear warning on top, that they are no longer accurate enough to base any conclusions on it.
Gergo:
Is there a specific reason for disabling country, mime type etc. reports?
You're right, some of the traffic reports under discussion are less maintenance sensitive, mime type and target wiki are good examples. I might as well keep those for now.
There is a major issue with the breakdown by geography reports, and I may have to invalidate versions for 2015. For example share of Russian traffic dropped from 5% to 1% in recent reports.
This may have to do with https traffic being misattributed to country where WMF data center resides. I will follow-up.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Gergo Tisza Sent: Saturday, July 25, 2015 21:02 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] proposal to axe current traffic reports
On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats broadly comes in two parts - A Content and activity reports per wiki (html tables and charts based on the xml dumps) - B Traffic reports
Traffic reports are built from two sources
-- B1 Domas' hourly aggregations per wiki, aggregated further into monthly totals per wiki (mobile/non-mobile, normalized/non-normalized), grouped by project e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
-- B2 Sampled log lines (these days generated via hadoop)
These sampled log lines are used for two types of reports (with some hybrids)
--- B2a Breakdowns of traffic by geographic criteria (country, continent, N/S) http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVi...
--- B2b Breakdowns of traffic by non geographic criteria (os, browser, mime type, target wiki, referer, etc) http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
My current proposal is on disabling B2b and hybrid reports like http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
Is there a specific reason for disabling country, mime type etc. reports? User agent sniffing rules require constant updates as new browsers appear, so browser reports become misleading when unmaintained, but I would expect e.g. the target wiki logic to be fairly stable; and country logic (I assume) is maintained externally by MaxMind; are there also known problems with those?
So how to go about this? I made a list of squid based traffic reports (some more to add). Will this work?
Concept pages:
https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future#Future:_general_ideas
https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per...
This is great, thanks for setting it up Erik. Everyone interested, be bold!
I agree that the list of problems for users of these reports is getting so long that the relative value of these reports is less and less. They are still very useful though. I think we should keep them until we have new ones, because if you axe them now, no one will remember how or why they were built (and you won't be able to point users in the right direction). Often it's not until you kill a thing that the people who depended on that thing come out from under the rocks to complain.
On Fri, Jul 24, 2015 at 6:59 PM, Erik Zachte erikzachte@infodisiac.com wrote:
Hi all,
I think the time has come to disable the traffic reports based on webstatscollector (2.0) data.
See http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traf...
These reports are using outdated definitions for page views.
The scripts haven't seen any maintenance for years.
Even with the new pageview API still in development more and more these reports are misreporting reality anyway.
There was a period were I felt imperfect reports were better than no reports at all, and I warned about unresolved bugs in the report header.
But the anomaly reported below served as a wake-up-call for me that mismatches are intolerably high anyway.
So I propose to put up a notice on the latest reports that those were the last release, and WMF is working to deliver a new infrastructure in the form of a pageview API, ETA later this year.
See also https://phabricator.wikimedia.org/T44259
Whether WMF will also assume responsibility for building new reports on top of that API (and if so in what form) is another matter, but first things first. Current focus is on providing that API, as it should be IMO.
Any thoughts?
Erik Zachte
*From:* Erik Zachte [mailto:erikzachte@infodisiac.com] *Sent:* Friday, July 24, 2015 17:58 *To:* 'Андрей Лавров' *Subject:* RE: Wikimedia Traffic Analysis Report - Operating Systems
Hey Andrey,
You're totally right of course. And not the only to notice. These traffic reports haven't seen much (maintenance) love lately. I'm tempted to disable them. I'm looking forward to the upcoming WMF pageview API as much more promising platform to build better reports: more up to date, more robust, more flexible. Of course there is always a hazard to stop maintaining a solution before a replacement is really there, but this is what actually happened long ago.
Thanks for heads-up.
Erik
*From:* Андрей Лавров [mailto:andrey.lavrov@wancastle.com] *Sent:* Wednesday, July 22, 2015 11:09 *To:* erikzachte@infodisiac.com *Subject:* Wikimedia Traffic Analysis Report - Operating Systems
Dear Erik,
Please, improve your analysis reports by including Chrome OS statistics.
Chrome OS has about 10% market share in US now. Almost all chromebooks are online every day. It is very strange to not see Chrome OS market share in your reports.
Best regards,
Andrey Lavrov
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics