Dan:
I would love it if people on this list or elsewhere
would start identifying the highest value reports from wikistats. We can also use traffic
data to figure out the most popular pages, but this doesn't always mean highest value.
The traffic data Dan refers to (I assume) is this:
http://stats.wikimedia.org/wikistats-traffic-2015-04.html
Indeed pageviews for each report can be misleading (see e.g. red links to totally outdated
reports)
So how to go about this? I made a list of squid based traffic reports (some more to add).
Will this work?
Concept pages:
https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future
<https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future#Future:_general_ideas>
https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_pe…
Jane:
I think we should keep them until we have new ones,
because if you axe them now, no one will remember how or why they were built (and you
won't be able to point users in the right direction).
Sure, I'm not going to delete the existing reports. I'm merely suggesting not to
update some of those, and put a clear warning on top, that they are no longer accurate
enough to base any conclusions on it.
Gergo:
Is there a specific reason for disabling country, mime
type etc. reports?
You're right, some of the traffic reports under discussion are less maintenance
sensitive, mime type and target wiki are good examples. I might as well keep those for
now.
There is a major issue with the breakdown by geography reports, and I may have to
invalidate versions for 2015. For example share of Russian traffic dropped from 5% to 1%
in recent reports.
This may have to do with https traffic being misattributed to country where WMF data
center resides. I will follow-up.
Erik
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of Gergo Tisza
Sent: Saturday, July 25, 2015 21:02
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] proposal to axe current traffic reports
On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Wikistats broadly comes in two parts
- A Content and activity reports per wiki (html tables and charts based on the xml dumps)
- B Traffic reports
Traffic reports are built from two sources
-- B1 Domas' hourly aggregations per wiki, aggregated further into monthly totals
per wiki (mobile/non-mobile, normalized/non-normalized), grouped by project
e.g.
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
-- B2 Sampled log lines (these days generated via hadoop)
These sampled log lines are used for two types of reports (with some hybrids)
--- B2a Breakdowns of traffic by geographic criteria (country, continent, N/S)
http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesV…
--- B2b Breakdowns of traffic by non geographic criteria (os, browser, mime type,
target wiki, referer, etc)
http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+tra…
My current proposal is on disabling B2b and hybrid reports like
http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
Is there a specific reason for disabling country, mime type etc. reports? User agent
sniffing rules require constant updates as new browsers appear, so browser reports become
misleading when unmaintained, but I would expect e.g. the target wiki logic to be fairly
stable; and country logic (I assume) is maintained externally by MaxMind; are there also
known problems with those?