Hello!
The analytics team wishes to announce that we have finally transitioned
several of the pageview reports in stats.wikimedia.org to the new pageview
definition [1]. This means that we should no longer have two conflicting
sources of pageview numbers.
While we are not not fully done transitioning pageview reports we feel this
is an important enough milestone that warrants some communication. BIG
Thanks to Erik Z. for his work on this project.
Please take a look at a report using the new definition (a banner is
present when report has been updated)
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
Thanks,
Nuria
[1] https://meta.wikimedia.org/wiki/Research:Page_view
FYI that I have uploaded the two wikistats workflow diagrams that Eric Z.
did to wikitech as I think they are easier to find there than they would be
on phabricator:
https://wikitech.wikimedia.org/wiki/Analytics/Wikistats
Hey, I was wondering if it is possible to export the results of Hive
queries to some world-readable place?
What I'm trying to achieve: for my www portals work, I want the results of
aggregation (SELECT project, sum(view_count) AS num FROM projectview_hourly
WHERE year=2015 AND month=11 AND day=3 GROUP BY project) published
somewhere in a machine-readable format. Ideally, this could be published
externally (for example, https://stats.wikimedia.org/daily_pageviews.csv or
whatever). If that is hard, making it somehow available on the cluster
would suffice. What are the options for doing that?
--
Best regards,
Max Semenik ([[User:MaxSem]])
Hi all,
here is the usual weekly look at our most important readership metrics
(CCing the Analytics-l mailing list too this time).
As laid out earlier, the main purpose is to raise awareness about how these
are developing, call out the impact of any unusual events in the preceding
week, and facilitate thinking about core metrics in general. We are still
iterating on the presentation (e.g. to better take seasonality into
account, in particular including year-over-year comparisons) and eventually
want to create dashboards for those which are not already available in that
form already. Feedback and discussion continue to be welcome.
Now to the usual data. (All numbers below are averages for October
26-November 1, 2015 unless otherwise noted.)
Pageviews
Total: 525 million/day (-1.5% from the previous week)
Context (April 2015-October 2015):
(see also the Vital Signs dashboard
<https://vital-signs.wmflabs.org/#projects=all/metrics=Pageviews>)
1.5% is a somewhat noticeable drop, and as in the last report I ran a query
for the countries with the largest changes from the previous week. Some
interesting data, but not sufficient for attributing the overall drop to a
particular area:
-
Ireland +40.2%
-
Romania -35.6% (previous week: +60%)
-
France +14.5%
-
Philippines -12.4% (previous week: -12.4%(!))
-
Mexico -10.3%
-
Colombia -9.2%
-
Ecuador -8.9%
-
Israel -7.7%
-
Malaysia -7.6%
-
Sweden -7.5%
Desktop: 57.7%
Mobile web: 1.2%
Apps: 41.1%
(same as previous week)
Global North ratio: 77.1% of total pageviews (previous week: 76.9%)
Context (April 2015-November 2015):
This week, instead of plotting the absolute numbers as usual
<https://commons.wikimedia.org/wiki/File:Wikimedia_pageviews,_Global_South_v…>,
let’s chart the percentage:
It’s not a definite proof, but this chart shows a pretty clear rise (or
conversely, decrease in the ratio of traffic from the Global South) during
the time of the staggered HTTPS-only rollout in June.
Unique app users
Android: 1.161 million /day (-0.0% from the previous week)
Context (August-November 2015):
iOS: 278k / day (-0.7% from the previous week)
Context (August-November 2015):
As anticipated, the marked increase in new installations while the app was
featured recently (see below) did not move the DAU needle much.
New app installations
Android: 37.2k/day (-1.2% from the previous week)
Daily installs per device, from Google Play
Context (August-November 2015):
iOS: 3.96k/day (-35.0% from the previous week)
Download numbers from App Annie
Context (August-November 2015):
A slightly conspicuous drop last Thursday. But most of the large
week-over-week decrease came from the app having been featured in the App
Store previously (see earlier weekly reports).
And since you read this far, a little reward in form of a link
<http://www.adweek.com/news/advertising-branding/ad-day-adobe-knows-what-you…>
to a mildly amusing 1 minute video ad that mocks the data analytics mishaps
of a fictitious but easily recognized large encyclopedia project ;)
----
For reference, the queries and source links used are listed below (access
is needed for each). Most of the above charts are available on Commons, too
<https://commons.wikimedia.org/wiki/Category:Wikimedia_readership_metrics_re…>
.
hive (wmf)> SELECT SUM(view_count)/7000000 AS avg_daily_views_millions FROM
wmf.projectview_hourly WHERE agent_type = 'user' AND
CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) BETWEEN "2015-10-26"
AND "2015-11-01";
hive (wmf)> SELECT year, month, day,
CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) as date,
sum(IF(access_method <> 'desktop', view_count, null)) AS mobileviews,
SUM(view_count) AS allviews FROM wmf.projectview_hourly WHERE year=2015 AND
agent_type = 'user' GROUP BY year, month, day ORDER BY year, month, day
LIMIT 1000;
hive (wmf)> SELECT access_method, SUM(view_count)/7 FROM
wmf.projectview_hourly WHERE agent_type = 'user' AND
CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) BETWEEN "2015-10-26"
AND "2015-11-01" GROUP BY access_method;
hive (wmf)> SELECT SUM(IF (FIND_IN_SET(country_code,
'AD,AL,AT,AX,BA,BE,BG,CH,CY,CZ,DE,DK,EE,ES,FI,FO,FR,FX,GB,GG,GI,GL,GR,HR,HU,IE,IL,IM,IS,IT,JE,LI,LU,LV,MC,MD,ME,MK,MT,NL,NO,PL,PT,RO,RS,RU,SE,SI,SJ,SK,SM,TR,VA,AU,CA,HK,MO,NZ,JP,SG,KR,TW,US')
> 0, view_count, 0))/SUM(view_count) FROM wmf.projectview_hourly WHERE
agent_type = 'user' AND
CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")) BETWEEN "2015-10-26"
AND "2015-11-01";
hive (wmf)> SELECT year, month, day,
CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0")), SUM(view_count) AS
all, SUM(IF (FIND_IN_SET(country_code,
'AD,AL,AT,AX,BA,BE,BG,CH,CY,CZ,DE,DK,EE,ES,FI,FO,FR,FX,GB,GG,GI,GL,GR,HR,HU,IE,IL,IM,IS,IT,JE,LI,LU,LV,MC,MD,ME,MK,MT,NL,NO,PL,PT,RO,RS,RU,SE,SI,SJ,SK,SM,TR,VA,AU,CA,HK,MO,NZ,JP,SG,KR,TW,US')
> 0, view_count, 0)) AS Global_North_views FROM wmf.projectview_hourly
WHERE year = 2015 AND agent_type='user' GROUP BY year, month, day ORDER BY
year, month, day LIMIT 1000;
SELECT country_code, changeratio, ROUND(milliondailyviewsthisweek,1) AS
milliondailyviewsthisweek FROM
(SELECT country_code, ROUND(100*SUM(IF((day>25 AND month=10) OR (day<2
AND month=11), view_count, null))/SUM(IF(day>18 AND day<26, view_count,
null))-100,1) AS changeratio, SUM(IF((day>25 AND month=10) OR (day<2 AND
month=11), view_count, null))/7000000 AS milliondailyviewsthisweek
FROM wmf.projectview_hourly
WHERE
year = 2015
AND month > 9
AND agent_type = "user"
GROUP BY country_code)
AS countrylist
WHERE milliondailyviewsthisweek > 1 GROUP BY country_code, changeratio,
milliondailyviewsthisweek ORDER BY ABS(changeratio) DESC LIMIT 10;
hive (wmf)> SELECT SUM(IF(platform = 'Android',unique_count,0))/7 AS
avg_Android_DAU_last_week, SUM(IF(platform = 'iOS',unique_count,0))/7 AS
avg_iOS_DAU_last_week FROM wmf.mobile_apps_uniques_daily WHERE
CONCAT(year,LPAD(month,2,"0"),LPAD(day,2,"0")) BETWEEN 20151026 AND
20151101;
hive (wmf)> SELECT CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0"))
as date, unique_count AS Android_DAU FROM wmf.mobile_apps_uniques_daily
WHERE platform = 'Android';
hive (wmf)> SELECT CONCAT(year,"-",LPAD(month,2,"0"),"-",LPAD(day,2,"0"))
as date, unique_count AS iOS_DAU FROM wmf.mobile_apps_uniques_daily WHERE
platform = 'iOS';
https://console.developers.google.com/storage/browser/pubsite_prod_rev_0281…
(“overview”)
https://www.appannie.com/dashboard/252257/item/324715238/downloads/?breakdo…
(select “Total”)
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
* Do you have small, self-contained, "easy" bugs you'd love to get fixed?
(Also see https://www.mediawiki.org/wiki/Annoying_little_bugs )
* Does your documentation need improvements?
* Do your old bugs welcome some testing?
* Does your user interface have some small design issues?
* Do you have some tasks that welcome some research?
Google Code-In (GCI) will take place again in Dec+Jan: a contest for
13-17 year old students to provide small contributions to free software
projects.
Wikimedia will apply again to take part in GCI. The more tasks we can
offer the likelier the changes Wikimedia will get accepted.
Unsure about quality of contributions and effort?
Read about tgr's post about Multimedia achievements in GCI 2014:
https://lists.wikimedia.org/pipermail/multimedia/2015-January/001009.html
In short:
* Add the project "GCI2015" + a comment to Phabricator tasks you'd mentor.
* Tasks are welcome in five areas: Code; Outreach/Research;
Documentation/Training; Quality Assurance; User Interface.
* Make sure the task description provides pointers to help the student.
* Add yourself to the table of mentors on the wikipage.
* "Beginner tasks" (<30 min for an experienced contributor) also welcome.
* "Generic" tasks also welcome (e.g. "Fix two user interface messages from
the "Blocked By" list in https://phabricator.wikimedia.org/T40638 ").
For all information, check
https://www.mediawiki.org/wiki/Google_Code-in_2015#Mentors.27_corner
6 "easy" Analytics tasks (are they still valid? are there more?):
https://phabricator.wikimedia.org/maniphest/query/HtsgCU8lxlhC/#R
Can you imagine providing a helping hand to someone fixing tasks?
Please ask if you have questions!
Thank you!
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
Jaime:
(Adding analytics e-mail list)
Please send notes regarding eventlogging to analytics@
Thanks,
Nuria
On Tue, Nov 3, 2015 at 9:03 AM, Jaime Crespo <jcrespo(a)wikimedia.org> wrote:
> As actionables of
> https://wikitech.wikimedia.org/wiki/Incident_documentation/20151022-EventLo…
>
> I have migrated the replication method used by Sean to puppet and added
> monitoring (which was missing initially). While the current state could be
> iteratively improved, it no longer depends from a single process on a
> single machine, that could be restarted or fail at any time. We also have
> logs and alerts to identify issues immediately. This will also allow
> purging rows easier and faster in the short future.
>
> Of course, any migration could have regressions, so please monitor any
> issue you may find (as I am currently doing, and I have not yet found).
> This will hopefully prevent the issue to happen again.
>
> Regards,
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
Dear Analytics team,
I’m Giacomo Marangoni and I’m working on Pagecounts-raw data. During a little time spent in analysis, I’ve pointed out a little strange behaviour that could affect your data.
Sometimes I found record like this “it.n Addio_al_regista_Sydney_Pollack 1 0” and I can’t explain myself how a page could be visited one time and turn back a response of 0 byte.
Can I take it as a default value when a record is acquired in the wrong way?
Best regards
Giacomo Marangoni