Oh, cool. I didn't know you had seen that. I cheated on the performance for that video
by caching traffic for individual pages I had already browsed. Now the real performance is
comparable to what the original demo showed (although it takes up a decent amount of
memory). Yeah, I should probably translate it to Python or something at some point so more
people can try it.
From: ezachte(a)wikimedia.org
To: analytics(a)lists.wikimedia.org
Date: Thu, 19 Mar 2015 18:44:13 +0100
Subject: Re: [Analytics] Monthly compressed traffic delay
Thanks for the heads-up, Michael. It prompted me to watch again your initial demo with
spoken
comments.https://www.youtube.com/watch?v=f3QXwY-XR28 I don't have Mathematica,
so I can't run your script, but it certainly seems fun to play with! Cheers,Erik From:
analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Michael Hale
Sent: Thursday, March 19, 2015 16:53
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] Monthly compressed traffic delay Thanks again for fixing those,
Erik. In case you or the others want to see how much the monthly files improved the
performance of my local category browser, I've linked to a short GIF animation. The
old version polled the stats.grok.se server and would often only get a single page result
about every 3 seconds, so it's a huge speedup.
http://i.stack.imgur.com/9Yjjx.gif
From: hale.michael.jr(a)live.com
To: analytics(a)lists.wikimedia.org
Date: Tue, 24 Feb 2015 17:28:51 -0500
Subject: Re: [Analytics] Monthly compressed traffic delayThanks, Erik. I actually noticed
the empty title records in the hourly files recently too. I didn't make the connection
that it could have been the culprit though. To give an example of one type of output I
make, here are the most popular articles for different media types from a 3 day span from
yesterday. Your compressed files will definitely open up some new scenarios though.
https://docs.google.com/spreadsheets/d/19IoFHy-U0JInOzi32_iemTXcEmGudeK-jXU…
From: ezachte(a)wikimedia.org
To: analytics(a)lists.wikimedia.org
Date: Tue, 24 Feb 2015 23:09:53 +0100
Subject: Re: [Analytics] Monthly compressed traffic delayMichael, a quick heads-up: So I
finally found the time to look into this.Sorry that it took so
long.https://phabricator.wikimedia.org/T90230Bug has been analyzed and fixed. The
underlying problem is a record in an hourly pageview dump with empty title. My script now
patches such records with title '-no-title-'.I filed a separate bug for that:
https://phabricator.wikimedia.org/T90629 Daily aggregation has been restarted and
successfully processed data for Jan 27. Now it will take a day or two to catch up.
Cheers,Erik From: Erik Zachte [mailto:ezachte@wikimedia.org]
Sent: Thursday, February 19, 2015 4:13
To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.'
Subject: RE: [Analytics] Monthly compressed traffic delay Hi Michael, Thanks for your
offer, I appreciate it.I've been quite busy in recent weeks , but haven't
forgotten abouth these compressed dumps, and will look into it soon (less than a week).
Cheers,Erik From: analytics-bounces(a)lists.wikimedia.org
[mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale
Sent: Wednesday, February 18, 2015 15:24
To: analytics(a)lists.wikimedia.org
Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files
that are maintained by Erik Zachte. I'm guessing those processes are given a low
priority compared to the content backups that need to run. More generally, I'm
interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now
on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in
Atlanta. I organize the picnic each summer, and helped get the rest of the historic
buildings photographed. I've dabbled in reverting vandalism, and I contribute to
articles when I actually have something to contribute. I don't feel like I've
settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the
traffic servers get bogged down sometimes though. Can I help? Should I try to get the
Atlanta group to pool our donations this year for an extra computer?
Thanks,
Michael
_______________________________________________ Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics