Thank you! Would you mind posting a note on Analytics(a)lists.wikimedia.org
when it is working normally again?
On Wed, Feb 11, 2015 at 1:36 PM, Henrik Abelsson <henrik(a)abelsson.com>
wrote:
> Hi Kevin,
>
> Looking into it!
>
> -henrik
>
>
> On 11/02/15 16:36, Kevin Leduc wrote:
>
> Hi Henrik,
>
> stats.grok.se has missing data in the last week. Can you restart the
> service to see if that helps?
>
> Thanks!
> Kevin Leduc
> Analytics Product Manager
>
>
>
Kevin,
Thanks for adding the dashboards to the list. That's a great resource.
Actually, I worked with the Parsoid team while adding this instrumentation,
so I got their feedback along the way.
--
E.Christy Okpo
ecokpo(a)gmail.com
Hi,
TL;DR: If you think your Hive queries are currently taking longer than
usual, please find qchris in IRC, and if he is not responsive, kindly
ask someone with root on stat1002 (like Ops) to kill the process
java -Dproc_balancer -Xmx1000m [...]
-----------------------------------------------------
Data in the Analytics cluster is not evenly distributed. Some data
nodes are >90% full, while others are half empty.
Data nodes that are >90% full are considered unhealthy and no longer
contribute to the pool of available resources. So unhealty data nodes
no longer contribute to the total available memory in the cluster.
There are other motivations too, but the latter item alone is enough
motivation to keep the data nodes balanced and hence healthy.
Rebalancing is running since 2015-02-26, but situation is getting
worse quicker than rebalancing can rebalance.
We've been up to 5 unhealthy nodes.
Since we're missing their memory, I decided that we should rebalance
more aggressively. Hence, I bumped the rebalancer's capacity, and
nodes are recovering and getting healthy again.
I am monitoring the increased-capacity rebalancer closely, but in case
you're getting blocked by it without me noticing, please find me in
IRC and let me know, so I can turn the rebalancer's capacity down.
Or if you find me unresponsive, please find someone with root on
stat1002 (like Ops) and ask thon to kill the process
java -Dproc_balancer -Xmx1000m [...]
on stat1002.
Have fun,
Christian
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
I'm a student of computational physics from Czech Republic and I sometimes used data displayed here http://stats.wikimedia.org/wikimedia/squids/SquidReportCountryData.htm for my personal analysis of Wikipedia just to know how is used and trending. But it has gone silent during January and there are no updates for year 2015. Do you plan to publish country data somewhere?
Thank you,
Jakub Havlik
Hi Analytics people,
Three new fields are available in the hive table wmf.webrequest (as per the
documentation):
user_agent_map map<string,string> User-agent map with
browser_name, browser_major, device, os_name, os_minor, os_major keys
and associated values
x_analytics_map map<string,string> X_analytics map view of the
x_analytics field
webrequest_source string Source cluster
You can access map values using this syntax: user_agent_map['browser_name'].
Documentation has been updated here:
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest
Enjoy :)
--
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal