I know this answer comes late as I was on vacation, sorry about that.
At this time the cluster is not ready to be accessed by users not in the
analytics team as things are still WIP. Now, in order to get the data you
rae interested in you can always ask the research team to retrieve it for
you (this is what we did for our pilot, actually).
Please e-mail: analytics(a)lists.wikimedia.org and let us know what you are
On Wed, Jul 30, 2014 at 8:40 PM, Pine W <wiki.pine(a)gmail.com> wrote:
> Nuria and Andrew,
> Forwarding a question from Han-teng below.
> Dear Pine,
> A humorous touch here in your most recent email: "*A $1 fine will be
> imposed by Oliver Keyes on anyone who misspells Leila's name or misdirects
> emails to the WMF Executive Director."
> I have one slightly more serious question, on the possibility to use
> the analytics infrastructure for the upcoming Hackathon.
> My Hackathon wish is to duplicate and reapply what Nuria Ruiz and
> Andrew Otto has done for NARA analytics pilot.
> So to your knowledge, is it feasible to do so, in terms of (a) setting
> up basic access for other users to duplicate the pilot, (b) getting some
> help from Ruiz and/or Otto, and (c) setting up for other GLAM institution
> that is not NARA.
> Feel free to forward this email to Nuria Ruiz and/or Andrew Otto
> because I do not have their contacts.
> han-teng liao
> "[O]nce the Imperial Institute of France and the Royal Society of London
> begin to work together on a new encyclopaedia, it will take less than a
> year to achieve a lasting peace between France and England." - Henri
> Saint-Simon (1810)
> "A common ideology based on this Permanent World Encyclopaedia is a
> possible means, to some it seems the only means, of dissolving human
> conflict into unity." - H.G. Wells (1937)
> 2014-07-18 8:28 GMT+01:00 Pine W <wiki.pine(a)gmail.com>:
>> Thanks for this. Forwarding to Analytics and Research for others who are
>> On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand <rfarrand(a)wikimedia.org>
>>> This Tech Talk will be starting in 30 minuets. Thanks!
>>> On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand <rfarrand(a)wikimedia.org>
>>> > Hello!
>>> > Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am
>>> > time/5pm UTC
>>> > <
>>> > for a 30 min tech talk. You can join our hangout or follow along on
>>> > youtube:
>>> > (please note that a link to join the hangout will be posted in the
>>> > of this event just as it starts).
>>> > You can follow ask questions on IRC during the talk in #wikimedia-dev.
>>> > If you are not able to follow along live, a video recording will be
>>> > here
>>> > <
>>> > to the MediaWiki YouTube channel immediately following the tech talk
>>> > you to view at any time.
>>> > More information about the tech talk:
>>> > *Hadoop and Beyond. An overview of Analytics infrastructure*In this
>>> > talk we will be presenting the analytics infrastructure that we have
>>> > recently rolled out in production. By now probably everybody knows that
>>> > wikimedia hosts an instance of hadoop from which we are going to
>>> > pageview data in the near future. But .. how exactly does the data get
>>> > there?
>>> > We will go over the path that webrequest log data takes from varnish to
>>> > kafka (a distributed log buffer) to hadoop and the challenges of
>>> > this java-based infrastructure in production. We will also talk about
>>> > can we query the data with hive, an SQL-like interface. How can you
>>> set up
>>> > this stack on vagrant to play with and, last but not least, how we used
>>> > hive recently to provide GLAM folks with image view stats:
>>> > Thanks!
>>> Wikitech-l mailing list
>> Wiki-research-l mailing list
just a quick heads up that the replication lag on
analytics-store.eqiad.wmnet (aka “The one machine to rule them all”)
has risen to >12 hours for s1 replicas. Other replicas are fine.
So on analytics-store.eqiad.wmnet:
* enwiki is affected.
* log (EventLogging) is affected.
Other databases (like dewiki, eswiki, ...) on
analytics-store.eqiad.wmnet are /not/ affected.
For queries that only rely on enwiki, or log, you can use
as drop in replacement. enwiki and log are not lagging there.
I filed RT ticket 8032:
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Growth team needs some data removed from both the raw logs and analytics
slaves. Sean said he can help with the EventLogging db maintenance, but is
unfamiliar with the logs on Vanadium.
This is to purge data from a recent set of experiments that involved
setting a token for anonymous editors. Now that we've got our results and
aggregated any non-private data we need in the future, we can safely remove
any data stored in the associated schemas. This doesn't need to be
selective based on schema ids or dates, we can probably just wholesale
remove the associated schemas listed at
Sean suggested Christian or Nuria might be best equipped to help here. If
Aaron and I provide a list of the schemas, is this possible? Ideally, we'd
like to delete these by 8/04, so apologies in advance for such a tight