Hey y'all,
Anyone know what powers, or more correctly what *should* power, the
MediaWiki: MobileFrontend dashboard [0]? I'm hoping that it's data from the
NavigationTiming extension but I've been known to be wrong.
–Sam
[0] https://gdash.wikimedia.org/dashboards/mobext/
Thanks Oliver!
We would like this data for as broad of a time period as you can muster.
The more days, months and year represented in the dataset, the better.
> Okay, so:
>
> I took an hour from the pageviews logs,[0] and aggregated pageviews to
> enwiki (mobile and desktop both) by timestamp, down to one-second
> resolution levels. The lowest number of pageviews to enwiki per second
> was 2,981
>
> So, I don't personally have a problem with generating a release of:
>
> 1. Pageviews per second;
> 2. To enwiki;
> 3. Over $TIME_PERIOD;
> 4. grouping the mobile and desktop site
>
> But Dario or someone should chip in before I touch anything ;p
>
> 6am yesterday. 6am because it should be low-traffic, right? At least
> given our biases towards north america and europe
>
> On 13 April 2015 at 11:54, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
> > Then that sounds much more viable. I'll run a quick test now to see
> > how much clustering we'd see at, say, the one-second resolution level,
> > and throw it out here so we can make more informed decisions about a
> > data release on this.
> >
> > On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gandhi(a)gmail.com> wrote:
> >> Hi Oliver,
> >>
> >> Re: Hirav: would you be looking for temporally /and/ contextually
> granular
> >> pageviews, i.e. "a view to X page at Y time", or just temporally
> granular,
> >> so "a view to a page on enwiki at X time"? If the latter you've got
> more of
> >> a shot, I suspect.
> >>
> >> I only want the latter - I am not concerned with the context so much as
> just
> >> “a view to a page on enwiki at X time.”
> >>
> >> Hirav
> >>
> >>
> >> On Apr 13, 2015, at 5:00 AM, analytics-request(a)lists.wikimedia.org
> wrote:
> >>
> >> Send Analytics mailing list submissions to
> >> analytics(a)lists.wikimedia.org
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> or, via email, send a message with subject or body 'help' to
> >> analytics-request(a)lists.wikimedia.org
> >>
> >> You can reach the person managing the list at
> >> analytics-owner(a)lists.wikimedia.org
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of Analytics digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >> 1. Re: Page views on a more frequent than hourly basis (Pine W)
> >> 2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Mon, 13 Apr 2015 00:47:31 -0700
> >> From: Pine W <wiki.pine(a)gmail.com>
> >> To: "A mailing list for the Analytics Team at WMF and everybody who
> >> has an interest in Wikipedia and analytics."
> >> <analytics(a)lists.wikimedia.org>
> >> Cc: Bharath Sitaraman <bharath1028(a)gmail.com>
> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
> >> basis
> >> Message-ID:
> >> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ(a)mail.gmail.com>
> >> Content-Type: text/plain; charset="utf-8"
> >>
> >>
> >> Hi,
> >>
> >> This issue of pageview data granularity has been discussed before, and
> the
> >> answer has been that hourly is the smallest increment allowed to be
> >> revealed publicly, for privacy reasons.
> >>
> >> I believe that the person you will want to discuss your request with is
> >> Toby, who I have cc'd here.
> >>
> >> Pine
> >> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi(a)gmail.com>
> wrote:
> >>
> >> Hi Wikimedia Analytics Team,
> >>
> >> My colleague Bharath and I are doing research on dynamic server
> allocation
> >> algorithms and we were looking for a suitable datasets to test our
> >> predictive algorithm on. We noticed that Wikimedia has an amazing data
> set
> >> of hourly page views, but we were looking for something a bit more
> >> granular, such as aggregated page requests to English Wikipedia on a
> minute
> >> by minute basis or second by second basis if possible.
> >>
> >> We are more than happy to pour through any raw data you might have that
> >> would help us calculate page requests at this granular level. Please
> let us
> >> know if it would be possible to get such data and if so how. Thank you
> in
> >> advance for your help.
> >>
> >> Best,
> >>
> >> Hirav Gandhi
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics(a)lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >>
Hi Wikimedia Analytics Team,
My colleague Bharath and I are doing research on dynamic server allocation algorithms and we were looking for a suitable datasets to test our predictive algorithm on. We noticed that Wikimedia has an amazing data set of hourly page views, but we were looking for something a bit more granular, such as aggregated page requests to English Wikipedia on a minute by minute basis or second by second basis if possible.
We are more than happy to pour through any raw data you might have that would help us calculate page requests at this granular level. Please let us know if it would be possible to get such data and if so how. Thank you in advance for your help.
Best,
Hirav Gandhi
Hi Oliver,
Re: Hirav: would you be looking for temporally /and/ contextually granular pageviews, i.e. "a view to X page at Y time", or just temporally granular, so "a view to a page on enwiki at X time"? If the latter you've got more of a shot, I suspect.
I only want the latter - I am not concerned with the context so much as just “a view to a page on enwiki at X time.”
Hirav
> On Apr 13, 2015, at 5:00 AM, analytics-request(a)lists.wikimedia.org wrote:
>
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
>
> Today's Topics:
>
> 1. Re: Page views on a more frequent than hourly basis (Pine W)
> 2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 13 Apr 2015 00:47:31 -0700
> From: Pine W <wiki.pine(a)gmail.com>
> To: "A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics."
> <analytics(a)lists.wikimedia.org>
> Cc: Bharath Sitaraman <bharath1028(a)gmail.com>
> Subject: Re: [Analytics] Page views on a more frequent than hourly
> basis
> Message-ID:
> <CAF=dyJgNUT+t6n6muJq16DuYiWP7et6ruHT3_-TZDnseP+29QQ(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> This issue of pageview data granularity has been discussed before, and the
> answer has been that hourly is the smallest increment allowed to be
> revealed publicly, for privacy reasons.
>
> I believe that the person you will want to discuss your request with is
> Toby, who I have cc'd here.
>
> Pine
> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gandhi(a)gmail.com> wrote:
>
>> Hi Wikimedia Analytics Team,
>>
>> My colleague Bharath and I are doing research on dynamic server allocation
>> algorithms and we were looking for a suitable datasets to test our
>> predictive algorithm on. We noticed that Wikimedia has an amazing data set
>> of hourly page views, but we were looking for something a bit more
>> granular, such as aggregated page requests to English Wikipedia on a minute
>> by minute basis or second by second basis if possible.
>>
>> We are more than happy to pour through any raw data you might have that
>> would help us calculate page requests at this granular level. Please let us
>> know if it would be possible to get such data and if so how. Thank you in
>> advance for your help.
>>
>> Best,
>>
>> Hirav Gandhi
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
Hi Analytics people,
Today happens another bunch of addition to the refined webrequest table in
hive.
Now the table contains:
- ts - The unix timestamp (milliseconds) version of the dt date
- access_method - The method used to access the site, being one of the
three [mobile app | mobile web | desktop]
- agent_type - To differentiate easily between spiders and users (more
values may be added later).
These additions are based on the "tags", as defined here:
https://meta.wikimedia.org/wiki/Research:Page_view
Have a good weekend !
--
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
(If you don't have the ability to run jobs on our Hadoop/Hive cluster,
this will be pretty boring and you don't have to read it)
Hey!
This is an (ir)regularly scheduled reminder that Christian wrote a
fantastic guide to what to do if the cluster is stalling - it lives at
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load#What_to_d…
The options it gives you are:
1. Kill jobs that you own;
2. Ask other people to kill jobs they own;
3. Buy more servers.
Unless your name is Nuria or Otto these are probably your only good
options; please do not kill other peoples' jobs, particularly jobs
marked "root.essential" run by "hdfs". These are (best-case) regularly
scheduled analysis and (worst-case) actual ETL and data consumption
tasks, which then have to be re-run.
If you notice the cluster is stalling and stopping your jobs doesn't
do anything, the #wikimedia-analytics IRC channel is probably your
best bet. On the weekdays, throwing a message in will probably be
enough. On the weekends, target it at one of the analytics engineers.
If none of that works, the mailing lists are also good. And if it's
ultra-critical, physically poke or phone someone.
<eom>
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
Hi,
Is it possible to have queries saved in Quarry run for several databases?
For example, I'd like to run http://quarry.wmflabs.org/query/3033 for all
the wikis on which the ContentTranslation extension is deployed.
Is there an easy way to do it, or do I have to save a Quarry query for
every database?
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hi Everyone,
I wanted to share a project currently under consideration for IdeaLab
funding and which may be of direct interest to the Wiki analytics
community. If you are interested or know someone who might be interested,
let me know. If you have feedback for the project, please submit it to the
discussion pages or email me.
Thanks!
Jason
*Wiki Controversy Monitoring Engine Call for Developers*
*Purpose*: The controversy monitoring engine maintains a real-time rating
of the controversiality of Wikipedia articles by listening to the live
stream of edits from Wikipedia. We need someone who is interested in
building the web interface and interactive visualizations around these
controversies to enable administrators to monitor, investigate, and, if
need be, intervene to deescalate controversies. The goal is to create a
site like stats.wikimedia.org.
*Requirements*: Knowledge of web development, web-based visualization,
and\or data analysis using Wikipedia's API or WikiData.
*For More Information*: see
https://meta.wikimedia.org/wiki/Grants:IdeaLab/Controversy_Monitoring_Engine
--
Jason Radford
Doctoral Student, Sociology, University of Chicago
Visiting Researcher, Lazer Lab, Northeastern University
*Connect*: LinkedIn <http://www.linkedin.com/in/jsradford>, Twitter
<http://www.twitter.com/jsradford>, University of Chicago
<http://home.uchicago.edu/%7Ejsradford/>
*Play Games for Science at Volunteer Science
<http://www.volunteerscience.com>*
Team:
As you might know we have swapped EL old vanadium box to a a never, more
resilient one.
This new box had less disk space and the move caused a small outage due to
a bug already present on EL code that was not apparent on vanadium.
Details can be found here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150406-EventLo…
Thanks,
Nuria