Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael
Hi Michael,
Thanks for your offer, I appreciate it.
I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week).
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay
Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael
Hi Erik,
No rush. I'm glad to establish communications with another branch of the Wikiverse. If I'm just browsing arbitrary categories by traffic, I have some code I run that is similar to the TreeView explorer hosted on the tool server. Mine just has a few extra features I need for manually merging subcategories. For example, the cuisine categories aren't as consistently structured as the film categories. That code polls each page manually from the stats.groke.se API though, which can get pretty slow for large subcategories (like films). So when I'm exploring trends in media like films, books, albums, songs, video games, TV series, etc. I have some separate code. It grabs the raw hourly files for the past 2-3 days, makes a hash table of all of the articles from the large subcategories I check frequently using the fast MediaWiki API, and then scans the hourly files once filling in the hash table traffic info. Then I saw your compressed monthly summary files and figured that would be even faster than downloading 2-3 days of the hourly files. I'll keep an eye out.
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Thu, 19 Feb 2015 04:12:55 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Hi Michael, Thanks for your offer, I appreciate it.I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week). Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Michael, a quick heads-up:
So I finally found the time to look into this.
Sorry that it took so long.
https://phabricator.wikimedia.org/T90230
Bug has been analyzed and fixed.
The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.
I filed a separate bug for that: https://phabricator.wikimedia.org/T90629
Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up.
Cheers,
Erik
From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay
Hi Michael,
Thanks for your offer, I appreciate it.
I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week).
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay
Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael
Thanks, Erik. I actually noticed the empty title records in the hourly files recently too. I didn't make the connection that it could have been the culprit though. To give an example of one type of output I make, here are the most popular articles for different media types from a 3 day span from yesterday. Your compressed files will definitely open up some new scenarios though. https://docs.google.com/spreadsheets/d/19IoFHy-U0JInOzi32_iemTXcEmGudeK-jXUD...
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Michael, a quick heads-up: So I finally found the time to look into this.Sorry that it took so long.https://phabricator.wikimedia.org/T90230Bug has been analyzed and fixed. The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.I filed a separate bug for that: https://phabricator.wikimedia.org/T90629 Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up. Cheers,Erik From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay Hi Michael, Thanks for your offer, I appreciate it.I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week). Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks again for fixing those, Erik. In case you or the others want to see how much the monthly files improved the performance of my local category browser, I've linked to a short GIF animation. The old version polled the stats.grok.se server and would often only get a single page result about every 3 seconds, so it's a huge speedup.
http://i.stack.imgur.com/9Yjjx.gif
From: hale.michael.jr@live.com To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 17:28:51 -0500 Subject: Re: [Analytics] Monthly compressed traffic delay
Thanks, Erik. I actually noticed the empty title records in the hourly files recently too. I didn't make the connection that it could have been the culprit though. To give an example of one type of output I make, here are the most popular articles for different media types from a 3 day span from yesterday. Your compressed files will definitely open up some new scenarios though. https://docs.google.com/spreadsheets/d/19IoFHy-U0JInOzi32_iemTXcEmGudeK-jXUD...
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Michael, a quick heads-up: So I finally found the time to look into this.Sorry that it took so long.https://phabricator.wikimedia.org/T90230Bug has been analyzed and fixed. The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.I filed a separate bug for that: https://phabricator.wikimedia.org/T90629 Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up. Cheers,Erik From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay Hi Michael, Thanks for your offer, I appreciate it.I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week). Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks for the heads-up, Michael.
It prompted me to watch again your initial demo with spoken comments.
https://www.youtube.com/watch?v=f3QXwY-XR28
I don't have Mathematica, so I can't run your script, but it certainly seems fun to play with!
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Thursday, March 19, 2015 16:53 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Monthly compressed traffic delay
Thanks again for fixing those, Erik. In case you or the others want to see how much the monthly files improved the performance of my local category browser, I've linked to a short GIF animation. The old version polled the stats.grok.se server and would often only get a single page result about every 3 seconds, so it's a huge speedup.
http://i.stack.imgur.com/9Yjjx.gif
_____
From: hale.michael.jr@live.com To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 17:28:51 -0500 Subject: Re: [Analytics] Monthly compressed traffic delay
Thanks, Erik. I actually noticed the empty title records in the hourly files recently too. I didn't make the connection that it could have been the culprit though. To give an example of one type of output I make, here are the most popular articles for different media types from a 3 day span from yesterday. Your compressed files will definitely open up some new scenarios though. https://docs.google.com/spreadsheets/d/19IoFHy-U0JInOzi32_iemTXcEmGudeK-jXUD pp5m0UE/edit?usp=sharing
_____
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Michael, a quick heads-up:
So I finally found the time to look into this.
Sorry that it took so long.
https://phabricator.wikimedia.org/T90230
Bug has been analyzed and fixed.
The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.
I filed a separate bug for that: https://phabricator.wikimedia.org/T90629
Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up.
Cheers,
Erik
From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay
Hi Michael,
Thanks for your offer, I appreciate it.
I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week).
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay
Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Oh, cool. I didn't know you had seen that. I cheated on the performance for that video by caching traffic for individual pages I had already browsed. Now the real performance is comparable to what the original demo showed (although it takes up a decent amount of memory). Yeah, I should probably translate it to Python or something at some point so more people can try it.
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Thu, 19 Mar 2015 18:44:13 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Thanks for the heads-up, Michael. It prompted me to watch again your initial demo with spoken comments.https://www.youtube.com/watch?v=f3QXwY-XR28 I don't have Mathematica, so I can't run your script, but it certainly seems fun to play with! Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Thursday, March 19, 2015 16:53 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Monthly compressed traffic delay Thanks again for fixing those, Erik. In case you or the others want to see how much the monthly files improved the performance of my local category browser, I've linked to a short GIF animation. The old version polled the stats.grok.se server and would often only get a single page result about every 3 seconds, so it's a huge speedup.
http://i.stack.imgur.com/9Yjjx.gif
From: hale.michael.jr@live.com To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 17:28:51 -0500 Subject: Re: [Analytics] Monthly compressed traffic delayThanks, Erik. I actually noticed the empty title records in the hourly files recently too. I didn't make the connection that it could have been the culprit though. To give an example of one type of output I make, here are the most popular articles for different media types from a 3 day span from yesterday. Your compressed files will definitely open up some new scenarios though. https://docs.google.com/spreadsheets/d/19IoFHy-U0JInOzi32_iemTXcEmGudeK-jXUD...
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delayMichael, a quick heads-up: So I finally found the time to look into this.Sorry that it took so long.https://phabricator.wikimedia.org/T90230Bug has been analyzed and fixed. The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.I filed a separate bug for that: https://phabricator.wikimedia.org/T90629 Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up. Cheers,Erik From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay Hi Michael, Thanks for your offer, I appreciate it.I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week). Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
It appears the monthly compressed traffic file generation has stopped again. It looks like the daily compressed files stopped generating on May 9th. Can we restart this process? I haven't looked at the hourly files recently. Perhaps there was another format change that caused it to crash.
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Michael, a quick heads-up: So I finally found the time to look into this.Sorry that it took so long.https://phabricator.wikimedia.org/T90230Bug has been analyzed and fixed. The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.I filed a separate bug for that: https://phabricator.wikimedia.org/T90629 Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up. Cheers,Erik From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay Hi Michael, Thanks for your offer, I appreciate it.I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week). Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I see they are running again as of a couple of days ago. Thank you. I've also seen some threads talking about larger, upcoming changes to the traffic statistics data. Looks interesting. Please let me know if I can help. Personally, I'm drifting towards investigations on how to enhance story idea generators with structured knowledgebases. Like, given a couple of random Wikipedia articles and a story theme from TVTropes, search through the type hierarchies and properties and dependencies of those articles to find an interesting way to stitch them together within the constraints of the randomly chosen story theme or setting.
From: hale.michael.jr@live.com To: analytics@lists.wikimedia.org Date: Mon, 1 Jun 2015 14:25:50 -0400 Subject: Re: [Analytics] Monthly compressed traffic delay
It appears the monthly compressed traffic file generation has stopped again. It looks like the daily compressed files stopped generating on May 9th. Can we restart this process? I haven't looked at the hourly files recently. Perhaps there was another format change that caused it to crash.
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Michael, a quick heads-up: So I finally found the time to look into this.Sorry that it took so long.https://phabricator.wikimedia.org/T90230Bug has been analyzed and fixed. The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.I filed a separate bug for that: https://phabricator.wikimedia.org/T90629 Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up. Cheers,Erik From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Thursday, February 19, 2015 4:13 To: 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' Subject: RE: [Analytics] Monthly compressed traffic delay Hi Michael, Thanks for your offer, I appreciate it.I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week). Cheers,Erik From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Michael Hale Sent: Wednesday, February 18, 2015 15:24 To: analytics@lists.wikimedia.org Subject: [Analytics] Monthly compressed traffic delay Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
FYI, the two places where people are talking about the new Pageview API that we are building are here:
* Original bugzilla (now phabricator) ticket (yes the title no longer applies): https://phabricator.wikimedia.org/T44259 * Analytics list thread: https://lists.wikimedia.org/pipermail/analytics/2015-June/004008.html
On Sat, Jun 13, 2015 at 12:01 PM, Michael Hale hale.michael.jr@live.com wrote:
I see they are running again as of a couple of days ago. Thank you. I've also seen some threads talking about larger, upcoming changes to the traffic statistics data. Looks interesting. Please let me know if I can help. Personally, I'm drifting towards investigations on how to enhance story idea generators with structured knowledgebases. Like, given a couple of random Wikipedia articles and a story theme from TVTropes, search through the type hierarchies and properties and dependencies of those articles to find an interesting way to stitch them together within the constraints of the randomly chosen story theme or setting.
From: hale.michael.jr@live.com To: analytics@lists.wikimedia.org Date: Mon, 1 Jun 2015 14:25:50 -0400
Subject: Re: [Analytics] Monthly compressed traffic delay
It appears the monthly compressed traffic file generation has stopped again. It looks like the daily compressed files stopped generating on May 9th. Can we restart this process? I haven't looked at the hourly files recently. Perhaps there was another format change that caused it to crash.
From: ezachte@wikimedia.org To: analytics@lists.wikimedia.org Date: Tue, 24 Feb 2015 23:09:53 +0100 Subject: Re: [Analytics] Monthly compressed traffic delay
Michael, a quick heads-up:
So I finally found the time to look into this.
Sorry that it took so long.
https://phabricator.wikimedia.org/T90230
Bug has been analyzed and fixed.
The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such records with title '-no-title-'.
I filed a separate bug for that: https://phabricator.wikimedia.org/T90629
Daily aggregation has been restarted and successfully processed data for Jan 27. Now it will take a day or two to catch up.
Cheers,
Erik
*From:* Erik Zachte [mailto:ezachte@wikimedia.org] *Sent:* Thursday, February 19, 2015 4:13 *To:* 'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.' *Subject:* RE: [Analytics] Monthly compressed traffic delay
Hi Michael,
Thanks for your offer, I appreciate it.
I've been quite busy in recent weeks , but haven't forgotten abouth these compressed dumps, and will look into it soon (less than a week).
Cheers,
Erik
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Michael Hale *Sent:* Wednesday, February 18, 2015 15:24 *To:* analytics@lists.wikimedia.org *Subject:* [Analytics] Monthly compressed traffic delay
Hello,
I'm inquiring about the delay for publishing the January compressed Wikistats files that are maintained by Erik Zachte. I'm guessing those processes are given a low priority compared to the content backups that need to run. More generally, I'm interested in finding new ways that I can help out. I'm an ex-Microsoftie who is now on the fraud analytics team at TD Bank. I've been involved with the Wikimedia group in Atlanta. I organize the picnic each summer, and helped get the rest of the historic buildings photographed. I've dabbled in reverting vandalism, and I contribute to articles when I actually have something to contribute. I don't feel like I've settled into a contributor role that really fits me yet though.
I enjoy using a variety of the traffic data sets that Wikimedia publishes. It seems the traffic servers get bogged down sometimes though. Can I help? Should I try to get the Atlanta group to pool our donations this year for an extra computer?
Thanks, Michael
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics