Jay Kreps is going to give a tech talk on Kafka at airbnb's HQ tomorrow: http://www.airbnb.com/meetups/jznjzcsa9-tech-talk-jay-kreps
Heads up: 14 spots remaining.
I'm going to try to make it but may not be able to. If you want to go but are unable to secure a spot, I can forfeit mine. If you do get a spot, it'll be an added incentive for me to tag along, so let me know either way!
--
Ori Livneh
ori(a)wikimedia.org
Sumana Harihareswara, 25/08/2012 00:02:
> I've just updated
> https://meta.wikimedia.org/wiki/Mailing_lists/Overview#Mediawiki_and_techni…
> . Sorry for the spam, but you may want to skim that and see whether
> there are lists there you should join.
Which made me notice that analytics lacks the must-have Gmane mirror:
please fix (for the archives, you need the mbox).
Nemo
---------- Forwarded message ----------
From: "Lars Aronsson" <lars(a)aronsson.se>
Date: Aug 20, 2012 5:38 PM
Subject: [Wikitech-l] Bots create page views
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
This regular bi-weekly pattern of reported page views
http://stats.grok.se/sv.s/latest90/Wikisource:Projekt_Bibel%C3%B6vers%C3%A4…
is created by a bot job. Note that this is a WikiProject page.
Are the reported number of page views inflated by bots?
Is there no way to filter out bot visits?
In other news, this page has a weekly pattern of real,
human visitors, since it is a radio quiz where lots of
listeners are known to use google searches,
http://stats.grok.se/sv/latest90/Melodikrysset
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Begin forwarded message:
> From: Data Visualization Group in Bay Area <info(a)meetup.com>
> Subject: New Meetup: VISUALIZATION IN R WITH GGPLOT2
> Date: August 10, 2012 2:57:05 PM PDT
> To: dario.taraborelli(a)gmail.com
>
>
>
>
> NEW MEETUP
> VISUALIZATION IN R WITH GGPLOT2
> Data Visualization Group in Bay Area
> Added by James Paul
> Tuesday, August 28, 2012
> 9:00 AM
> SeaPort Conference Center
> 459 Seaport Ct
> The Seaport Conference Center is located at the Port of Redwood City in San Mateo county.
> Redwood City , CA 94063
> Will you attend?
> Yes
> No
> Hi,
> There is a course coming up in the popular ggplot visualization package in R, in Redwood city. I am helping promoting the event. You can register athttp://www.revolutionanalytics.com/services/training/public/visualization…
> LEARN MORE
>
> Sponsored by American Statistical Association, Trulia and LinkedIn
> Follow us!
>
> Add info(a)meetup.com to your address book to receive all Meetup emails
> To unsubscribe from similar emails from this Meetup Group, click here
> Meetup, PO Box 4668 #37895 New York, New York 10163-4668
> Meetup HQ in NYC is hiring! http://www.meetup.com/jobs/
I'm working on testing Kafka broker failover, to see if and how many messages are lost in the case that a broker dies while producers are sending messages. Here's what I'm doing to test. It's nothing rigorous, just a try it and see what happens. Oh and here's a gist with the commands and scripts I'm using: https://gist.github.com/3286692
1) Start up a broker on each an03 and an04:
$ bin/kafka-server-start.sh config/server.properties
2) Start producing on an03:
$ ./sequence_generate.sh 20000 10000 | bin/kafka-console-producer.sh --zookeeper analytics1003:2181 --topic test6
3) While logs are being produced, kill a Kafka broker.
4) Produce some more, and at some point while still producing, re-start the downed broker:
$ bin/kafka-server-start.sh config/server.properties
5) Finish producing. Kill producer to make sure it doesn't have anything batched. (Should only matter with an asynchronous producer).
6) Start consumer on an04, saving stdout:
bin/kafka-consumer-shell.sh --props config/consumer.properties --topic test6 | tee -a /tmp/test6.log
7) Check logs to makes sure all messages made it through:
cat /tmp/test6.log | egrep '^consumed:' | awk '{print $3}' | sort -g | ~/bin/sequence_check.sh 1 20000
When I kill one of the brokers during message producing, I lose between 10-60 log messages (I think depending on the producer type, async vs. sync) in the final consumed output. I'm sure this is from the producer firing off messages to the assigned downed broker before ZooKeeper can notify the producer of the the pool reconfiguration.
So, I guess this is kind of as expected. In my most recent test, I lost 56 messages when I killed the broker. All of those messages were generated during the same second. Including those 56, there were about 500 messages fired off during that second. I don't have millisecond times on my messages right now (maybe I should add that in). As would be expected, no messages were lost when I brought the second broker back online. ZooKeeper reconfigured and the producer rerouted with no problems there.
Summary:
In a 2 node broker pool with one producer, it takes Kafka/Zookeeper less than a second to notice a failed broker and to reroute messages.
Is this acceptable? We will discuss :)
-Ao
Hi everybody,
Just a reminder that I'll be at the DataStax Cassandra Summit tomorrow, and likely out of contact most of the time. If you need anything, just send email and I'll get back to you.
<3
Additional mea culpa to RobLa as obviously I won't be able to work on gerrit-stats tomorrow.
--
David Schoonover
dsc(a)wikimedia.org
I wrote a gmond plugin to pump udp2log stats into ganglia: https://github.com/atdt/python-udp-gmond. Sharing in case it's useful.
--
Ori Livneh
ori(a)wikimedia.org