Analytics August 2012

analytics@lists.wikimedia.org

12 participants
9 discussions

Jay Kreps on Kafka
by Ori Livneh 30 Aug '12

30 Aug '12

Jay Kreps is going to give a tech talk on Kafka at airbnb's HQ tomorrow: http://www.airbnb.com/meetups/jznjzcsa9-tech-talk-jay-kreps Heads up: 14 spots remaining. I'm going to try to make it but may not be able to. If you want to go but are unable to secure a spot, I can forfeit mine. If you do get a spot, it'll be an added incentive for me to tag along, so let me know either way! -- Ori Livneh ori(a)wikimedia.org

5 5

Re: [Analytics] [Wikimedia-l] Tech-related lists you might not know about
by Federico Leva (Nemo) 25 Aug '12

25 Aug '12

Sumana Harihareswara, 25/08/2012 00:02: > I've just updated > https://meta.wikimedia.org/wiki/Mailing_lists/Overview#Mediawiki_and_techni… > . Sorry for the spam, but you may want to skim that and see whether > there are lists there you should join. Which made me notice that analytics lacks the must-have Gmane mirror: please fix (for the archives, you need the mbox). Nemo

1 0

Fwd: [Wikitech-l] Bots create page views
by Jeremy Baron 23 Aug '12

23 Aug '12

---------- Forwarded message ---------- From: "Lars Aronsson" <lars(a)aronsson.se> Date: Aug 20, 2012 5:38 PM Subject: [Wikitech-l] Bots create page views To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org> This regular bi-weekly pattern of reported page views http://stats.grok.se/sv.s/latest90/Wikisource:Projekt_Bibel%C3%B6vers%C3%A4… is created by a bot job. Note that this is a WikiProject page. Are the reported number of page views inflated by bots? Is there no way to filter out bot visits? In other news, this page has a weekly pattern of real, human visitors, since it is a radio quiz where lots of listeners are known to use google searches, http://stats.grok.se/sv/latest90/Melodikrysset -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

1 0

Bugzilla dump
by Ori Livneh 14 Aug '12

14 Aug '12

Hey, Do you guys know how I could get a dump of the bugzilla database? O -- Ori Livneh ori(a)wikimedia.org

5 5

Fwd: New Meetup: VISUALIZATION IN R WITH GGPLOT2
by Dario Taraborelli 11 Aug '12

11 Aug '12

Begin forwarded message: > From: Data Visualization Group in Bay Area <info(a)meetup.com> > Subject: New Meetup: VISUALIZATION IN R WITH GGPLOT2 > Date: August 10, 2012 2:57:05 PM PDT > To: dario.taraborelli(a)gmail.com > > > > > NEW MEETUP > VISUALIZATION IN R WITH GGPLOT2 > Data Visualization Group in Bay Area > Added by James Paul > Tuesday, August 28, 2012 > 9:00 AM > SeaPort Conference Center > 459 Seaport Ct > The Seaport Conference Center is located at the Port of Redwood City in San Mateo county. > Redwood City , CA 94063 > Will you attend? > Yes > No > Hi, > There is a course coming up in the popular ggplot visualization package in R, in Redwood city. I am helping promoting the event. You can register athttp://www.revolutionanalytics.com/services/training/public/visualization… > LEARN MORE > > Sponsored by American Statistical Association, Trulia and LinkedIn > Follow us! > > Add info(a)meetup.com to your address book to receive all Meetup emails > To unsubscribe from similar emails from this Meetup Group, click here > Meetup, PO Box 4668 #37895 New York, New York 10163-4668 > Meetup HQ in NYC is hiring! http://www.meetup.com/jobs/

1 0

d3 2.10 is out
by Dario Taraborelli 10 Aug '12

10 Aug '12

Just in case you missed this https://github.com/mbostock/d3/wiki/Release-Notes

2 2

Kafka Failover Tests
by Andrew Otto 09 Aug '12

09 Aug '12

I'm working on testing Kafka broker failover, to see if and how many messages are lost in the case that a broker dies while producers are sending messages. Here's what I'm doing to test. It's nothing rigorous, just a try it and see what happens. Oh and here's a gist with the commands and scripts I'm using: https://gist.github.com/3286692 1) Start up a broker on each an03 and an04: $ bin/kafka-server-start.sh config/server.properties 2) Start producing on an03: $ ./sequence_generate.sh 20000 10000 | bin/kafka-console-producer.sh --zookeeper analytics1003:2181 --topic test6 3) While logs are being produced, kill a Kafka broker. 4) Produce some more, and at some point while still producing, re-start the downed broker: $ bin/kafka-server-start.sh config/server.properties 5) Finish producing. Kill producer to make sure it doesn't have anything batched. (Should only matter with an asynchronous producer). 6) Start consumer on an04, saving stdout: bin/kafka-consumer-shell.sh --props config/consumer.properties --topic test6 | tee -a /tmp/test6.log 7) Check logs to makes sure all messages made it through: cat /tmp/test6.log | egrep '^consumed:' | awk '{print $3}' | sort -g | ~/bin/sequence_check.sh 1 20000 When I kill one of the brokers during message producing, I lose between 10-60 log messages (I think depending on the producer type, async vs. sync) in the final consumed output. I'm sure this is from the producer firing off messages to the assigned downed broker before ZooKeeper can notify the producer of the the pool reconfiguration. So, I guess this is kind of as expected. In my most recent test, I lost 56 messages when I killed the broker. All of those messages were generated during the same second. Including those 56, there were about 500 messages fired off during that second. I don't have millisecond times on my messages right now (maybe I should add that in). As would be expected, no messages were lost when I brought the second broker back online. ZooKeeper reconfigured and the producer rerouted with no problems there. Summary: In a 2 node broker pool with one producer, it takes Kafka/Zookeeper less than a second to notice a failed broker and to reroute messages. Is this acceptable? We will discuss :) -Ao

1 3

Cassandra Summit 2012 Tomorrow
by David Schoonover 08 Aug '12

08 Aug '12

Hi everybody, Just a reminder that I'll be at the DataStax Cassandra Summit tomorrow, and likely out of contact most of the time. If you need anything, just send email and I'll get back to you. <3 Additional mea culpa to RobLa as obviously I won't be able to work on gerrit-stats tomorrow. -- David Schoonover dsc(a)wikimedia.org

1 0

udp2log gmond plugin
by Ori Livneh 06 Aug '12

06 Aug '12

I wrote a gmond plugin to pump udp2log stats into ganglia: https://github.com/atdt/python-udp-gmond. Sharing in case it's useful. -- Ori Livneh ori(a)wikimedia.org

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics August 2012