Hey everyone -
Does anyone know if there is a way of measuring how much usage happens to a
tool hosted on labs? So for example - if I were to host a tool I developed
(ha), could I see how many people were accessing it, similar to what I
would find via Google Analytics?
*Jessie WildGrantmaking Learning & Evaluation *
Imagine a world in which every single human being can freely share in
the sum of all knowledge. Help us make it a reality!
Donate to Wikimedia <https://donate.wikimedia.org/>
Anyone know what's going on with the stats here:
Specifically, it looks like most of the notification types died in November
and link notifications had a huge spike on my birthday.
Sorry if this has already been discussed. I just noticed it.
Google has released over time a huge amount of open data from or about Wikipedia. Check them out:
50,000 Lessons on How to Read: a Relation Extraction Corpus
What is it: A human-judged dataset of two relations involving public figures on Wikipedia: about 10,000 examples of “place of birth” and 40,000 examples of “attended or graduated from an institution.”
40 Million Entities in Context
What is it: A disambiguation set consisting of pointers to 10 million web pages with 40 million entities that have links to Wikipedia. This is another entity resolution corpus, since the links can be used to disambiguate the mentions, but unlike the ClueWeb example above, the links are inserted by the web page authors and can therefore be considered human annotation.
Distributing the Edit History of Wikipedia Infoboxes
What is it: The edit history of 1.8 million infoboxes in Wikipedia pages in one handy resource. Attributes on Wikipedia change over time, and some of them change more than others. Understanding attribute change is important for extracting accurate and useful information from Wikipedia.
Dictionaries for linking Text, Entities, and Ideas
What is it: We created a large database of pairs of 175 million strings associated with 7.5 million concepts, annotated with counts, which were mined from Wikipedia. The concepts in this case are Wikipedia articles, and the strings are anchor text spans that link to the concepts in question.
(ht Nicolas Torzec)
On Wed, Nov 13, 2013 at 5:09 PM, Jon Robson <jrobson(a)wikimedia.org> wrote:
> Thanks so much Juliusz for exploring this and great work fixing the
> schema (apologies for me not predicting that might be an issue) and
> sorry for all the pain this must have caused you.
> We can't be the only teams using Limn in the Foundation. It might be
> worth pulling everyone together. Am I right in thinking that Limn is a
> child of the analytics team? Maybe we should at least spend some with
> them getting our use case resolved.. I guess this is why we have an
> analytics department? I can raise this issue in the next Scrum of
> Scrums if it is not resolved by then.
> On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera <jgonera(a)wikimedia.org>
> > For the past few days (or more) graphs at
> > http://mobile-reportcard.wmflabs.org/ stopped updating. The dashboard
> > consists of two parts: Limn, which displays the data, and backend scripts
> > that generate the graph data based on Event Logging data. The issue was
> > caused by two independent problems in the second component:
> > 1. A change of MobileWebEditing schema was incorrectly addressed in the
> > scripts' config and caused the script to throw an exception.
> > 2. Backend scripts are stupid and not optimized at all.
> > The first thing is fixed. To work around the second thing I had to
> > updates of "Editors registered on mobile who made 5+ edits on enwiki
> > (mobile+desktop)" graph  for now (the query was timing out and
> causing an
> > exception too) and removed the performance graph, since we'll be using
> > ganglia (and soon graphite) for that . Graphs should get updated soon.
> > So why are those backend scripts stupid? Because they run every hour and
> > recalculate _all_ the values for every single graph. For example, even
> > though total unique editors for June 2013 will never change, they are
> > recalculated every hour. This was a quick and easy solution for
> > graphs, but as Event Logging tables keep growing, we add more graphs and
> > those graphs show more and more data, it's no longer performing.
> > I discussed this briefly with Ori and I think we agree on the general
> > direction. We should definitely schedule some time for working on this.
> > could start with a spike investigating if there is a framework for
> > aggregating the sums that we could use and asking what other teams in the
> > foundation use for generating their graph data. The results of this spike
> > and possible following work could be useful not only for the mobile team.
> >  https://gerrit.wikimedia.org/r/#/c/95298/
> > 
> > --
> > Juliusz
Software Engineer, Mobile
forwarding to the proper list, analytics-internal(a)wikimedia.org should be
deleted from anyone's address book.
---------- Forwarded message ----------
From: Andrew Otto <aotto(a)wikimedia.org>
Date: Mon, Dec 2, 2013 at 8:42 AM
Subject: Stand up today
To: Analytics Team - Internal <analytics-internal(a)wikimedia.org>
Ah! The VA DMV is inconveniencing me! :)
My newused car needs to be registered. I mailed in the forms 3 weeks ago
hoping to have it all settled by now. It isn't, so I have to go to the DMV
to figure out why. They haven't been open due to holidays, and I thought
they opened at 8am, but they don't open til 9am.
And my mobile hotspot seems to be broken! :( I will most likely miss stand
up today while I figure this out.
Also, this means I'm not back in Brooklyn yet as I had intended. I
accidentally took last Friday as a vacation day, but didn't need to. So I'm
not sure what's happening yet, but I will probably need to drive back up
today or tomorrow. I worked a few hours this weekend, and if I need to
drive during work hours ill just use up that vacation day I took.
- varnishkafka deb approved, however faidon wants me to merge in one last
change from Magnus to support some logging change for W0.
- varnishkafka mobile puppetization is approved too.
- the above two together means I am no longer blocked on others for
varnishkafka deployment. I'd like to deploy to a mobile host or two this
- logster .deb has been approved. This means I can puppetize varnishkafka
ganglia stats and subsequently icings alerts. Dan, I need a little help w
python classpath things, something isn't making sense.
- python-Kafka .deb approved. Ori wanted thus for event logging-Kafka
support. I need to put this .deb into apt for him.
- lots of discussion about how to deal with cross dc latency. Magnus has
convinced faidon to let varnishkafka buffer up to 10G on disk if necessary.
Apparently this will help during short periods of packet loss and high late
latencies. He's still yet to code this though.
-I'm slated to help Nik turn elastic search back on today.
Happy post tg! Talk to y'all's later.