Thanks Oliver!
We would like this data for as broad of a time period as you can muster. The more days, months and year represented in the dataset, the better.
Okay, so:
I took an hour from the pageviews logs,[0] and aggregated pageviews to enwiki (mobile and desktop both) by timestamp, down to one-second resolution levels. The lowest number of pageviews to enwiki per second was 2,981
So, I don't personally have a problem with generating a release of:
- Pageviews per second;
- To enwiki;
- Over $TIME_PERIOD;
- grouping the mobile and desktop site
But Dario or someone should chip in before I touch anything ;p
6am yesterday. 6am because it should be low-traffic, right? At least given our biases towards north america and europe
On 13 April 2015 at 11:54, Oliver Keyes okeyes@wikimedia.org wrote:
Then that sounds much more viable. I'll run a quick test now to see how much clustering we'd see at, say, the one-second resolution level, and throw it out here so we can make more informed decisions about a data release on this.
On 13 April 2015 at 08:08, Hirav Gandhi hirav.gandhi@gmail.com wrote:
Hi Oliver,
Re: Hirav: would you be looking for temporally /and/ contextually
granular
pageviews, i.e. "a view to X page at Y time", or just temporally
granular,
so "a view to a page on enwiki at X time"? If the latter you've got
more of
a shot, I suspect.
I only want the latter - I am not concerned with the context so much as
just
“a view to a page on enwiki at X time.”
Hirav
On Apr 13, 2015, at 5:00 AM, analytics-request@lists.wikimedia.org
wrote:
Send Analytics mailing list submissions to analytics@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/analytics or, via email, send a message with subject or body 'help' to analytics-request@lists.wikimedia.org
You can reach the person managing the list at analytics-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Analytics digest..."
Today's Topics:
- Re: Page views on a more frequent than hourly basis (Pine W)
- Re: Page views on a more frequent than hourly basis (Oliver Keyes)
Message: 1 Date: Mon, 13 Apr 2015 00:47:31 -0700 From: Pine W wiki.pine@gmail.com To: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." analytics@lists.wikimedia.org Cc: Bharath Sitaraman bharath1028@gmail.com Subject: Re: [Analytics] Page views on a more frequent than hourly basis Message-ID: CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com Content-Type: text/plain; charset="utf-8"
Hi,
This issue of pageview data granularity has been discussed before, and
the
answer has been that hourly is the smallest increment allowed to be revealed publicly, for privacy reasons.
I believe that the person you will want to discuss your request with is Toby, who I have cc'd here.
Pine On Apr 13, 2015 12:11 AM, "Hirav Gandhi" hirav.gandhi@gmail.com
wrote:
Hi Wikimedia Analytics Team,
My colleague Bharath and I are doing research on dynamic server
allocation
algorithms and we were looking for a suitable datasets to test our predictive algorithm on. We noticed that Wikimedia has an amazing data
set
of hourly page views, but we were looking for something a bit more granular, such as aggregated page requests to English Wikipedia on a
minute
by minute basis or second by second basis if possible.
We are more than happy to pour through any raw data you might have that would help us calculate page requests at this granular level. Please
let us
know if it would be possible to get such data and if so how. Thank you
in
advance for your help.
Best,
Hirav Gandhi _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
....
...years?
We have unsampled logs for, ah. 2 months.
On 13 April 2015 at 19:30, Hirav Gandhi hirav.gandhi@gmail.com wrote:
Thanks Oliver!
We would like this data for as broad of a time period as you can muster. The more days, months and year represented in the dataset, the better.
Okay, so:
I took an hour from the pageviews logs,[0] and aggregated pageviews to enwiki (mobile and desktop both) by timestamp, down to one-second resolution levels. The lowest number of pageviews to enwiki per second was 2,981
So, I don't personally have a problem with generating a release of:
- Pageviews per second;
- To enwiki;
- Over $TIME_PERIOD;
- grouping the mobile and desktop site
But Dario or someone should chip in before I touch anything ;p
6am yesterday. 6am because it should be low-traffic, right? At least given our biases towards north america and europe
On 13 April 2015 at 11:54, Oliver Keyes okeyes@wikimedia.org wrote:
Then that sounds much more viable. I'll run a quick test now to see how much clustering we'd see at, say, the one-second resolution level, and throw it out here so we can make more informed decisions about a data release on this.
On 13 April 2015 at 08:08, Hirav Gandhi hirav.gandhi@gmail.com wrote:
Hi Oliver,
Re: Hirav: would you be looking for temporally /and/ contextually granular pageviews, i.e. "a view to X page at Y time", or just temporally granular, so "a view to a page on enwiki at X time"? If the latter you've got more of a shot, I suspect.
I only want the latter - I am not concerned with the context so much as just “a view to a page on enwiki at X time.”
Hirav
On Apr 13, 2015, at 5:00 AM, analytics-request@lists.wikimedia.org wrote:
Send Analytics mailing list submissions to analytics@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/analytics or, via email, send a message with subject or body 'help' to analytics-request@lists.wikimedia.org
You can reach the person managing the list at analytics-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Analytics digest..."
Today's Topics:
- Re: Page views on a more frequent than hourly basis (Pine W)
- Re: Page views on a more frequent than hourly basis (Oliver Keyes)
Message: 1 Date: Mon, 13 Apr 2015 00:47:31 -0700 From: Pine W wiki.pine@gmail.com To: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." analytics@lists.wikimedia.org Cc: Bharath Sitaraman bharath1028@gmail.com Subject: Re: [Analytics] Page views on a more frequent than hourly basis Message-ID: CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ@mail.gmail.com Content-Type: text/plain; charset="utf-8"
Hi,
This issue of pageview data granularity has been discussed before, and the answer has been that hourly is the smallest increment allowed to be revealed publicly, for privacy reasons.
I believe that the person you will want to discuss your request with is Toby, who I have cc'd here.
Pine On Apr 13, 2015 12:11 AM, "Hirav Gandhi" hirav.gandhi@gmail.com wrote:
Hi Wikimedia Analytics Team,
My colleague Bharath and I are doing research on dynamic server allocation algorithms and we were looking for a suitable datasets to test our predictive algorithm on. We noticed that Wikimedia has an amazing data set of hourly page views, but we were looking for something a bit more granular, such as aggregated page requests to English Wikipedia on a minute by minute basis or second by second basis if possible.
We are more than happy to pour through any raw data you might have that would help us calculate page requests at this granular level. Please let us know if it would be possible to get such data and if so how. Thank you in advance for your help.
Best,
Hirav Gandhi _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics