Dan:

> I would love it if people on this list or elsewhere would start identifying the highest value reports from wikistats.  We can also use traffic data to figure out the most popular pages, but this doesn't always mean highest value.

 

The traffic data Dan refers to (I assume) is this:

http://stats.wikimedia.org/wikistats-traffic-2015-04.html

Indeed pageviews for each report can be misleading (see e.g. red links to totally outdated reports)

 

So how to go about this? I made a list of squid based traffic reports (some more to add). Will this work?

 

Concept pages:

https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future

https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2

 

 

Jane:

> I think we should keep them until we have new ones, because if you axe them now, no one will remember how or why they were built (and you won't be able to point users in the right direction).

 

Sure, I'm not going to delete the existing reports. I'm merely suggesting not to update some of those, and put a clear warning on top, that they are no longer accurate enough to base any conclusions on it.

 

Gergo:

> Is there a specific reason for disabling country, mime type etc. reports?

 

You're right, some of the traffic reports under discussion are less maintenance sensitive, mime type and target wiki are good examples. I might as well keep those for now.

 

There is a major issue with the breakdown by geography reports, and I may have to invalidate versions for 2015. For example share of Russian traffic dropped from 5% to 1% in recent reports.

This may have to do with https traffic being misattributed to country where WMF data center resides. I will follow-up.

 

 

Erik

 

From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Gergo Tisza
Sent: Saturday, July 25, 2015 21:02
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] proposal to axe current traffic reports

 

On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte <ezachte@wikimedia.org> wrote:

Wikistats broadly comes in two parts
- A Content and activity reports per wiki (html tables and charts based on the xml dumps)
- B Traffic reports

  Traffic reports are built from two sources

  -- B1 Domas' hourly aggregations per wiki, aggregated further into monthly totals per wiki (mobile/non-mobile,  normalized/non-normalized), grouped by project
     e.g. http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm

  -- B2 Sampled log lines (these days generated via hadoop)

      These sampled log lines are used for two types of reports (with some hybrids)

      --- B2a Breakdowns of traffic by geographic criteria (country, continent, N/S)
            http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesVisitsEdits.htm

      --- B2b Breakdowns of traffic by non geographic criteria (os, browser, mime type, target wiki, referer, etc)
            http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+traffic

My current proposal is on disabling B2b and hybrid reports like
http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm

 

Is there a specific reason for disabling country, mime type etc. reports? User agent sniffing rules require constant updates as new browsers appear, so browser reports become misleading when unmaintained, but I would expect e.g. the target wiki logic to be fairly stable; and country logic (I assume) is maintained externally by MaxMind; are there also known problems with those?