On Fri, Jul 24, 2015 at 1:25 PM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Wikistats broadly comes in two parts
- A Content and activity reports per wiki (html tables and charts based on
the xml dumps)
- B Traffic reports
Traffic reports are built from two sources
-- B1 Domas' hourly aggregations per wiki, aggregated further into
monthly totals per wiki (mobile/non-mobile, normalized/non-normalized),
grouped by project
e.g.
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
-- B2 Sampled log lines (these days generated via hadoop)
These sampled log lines are used for two types of reports (with some
hybrids)
--- B2a Breakdowns of traffic by geographic criteria (country,
continent, N/S)
http://stats.wikimedia.org/wikimedia/squids/SquidReportsCountriesLanguagesV…
--- B2b Breakdowns of traffic by non geographic criteria (os,
browser, mime type, target wiki, referer, etc)
http://stats.wikimedia.org/cgi-bin/search_portal.pl?search=breakdown+of+tra…
My current proposal is on disabling B2b and hybrid reports like
http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm
Is there a specific reason for disabling country, mime type etc. reports?
User agent sniffing rules require constant updates as new browsers appear,
so browser reports become misleading when unmaintained, but I would expect
e.g. the target wiki logic to be fairly stable; and country logic (I
assume) is maintained externally by MaxMind; are there also known problems
with those?