Hi guys,
During this week the Gerrit and IRC panels have been added to the
Community Dashboard.
IRC: http://korma.wmflabs.org/browser/irc.html
Right now we are just analyzing two channels. What IRC channels should
we include here?
Gerrit: http://korma.wmflabs.org/browser/scr.html
We are following the same approach than git, analyzing the projects
coming from gerrit.wikimedia.org
Whit this two data sources the dashboard includes now all the sources
defined for this phase of the project. It is time to polish and reorder
the dashboard to make it really useful.
Cheers!
--
|\_____/| Alvaro del Castillo
[o] [o] acs(a)bitergia.com - CTO, Software Engineer
| V | http://www.bitergia.com
| |
-ooo-ooo-
Forwarding because I assume not all the G&P stakeholders are on the
Analytics list.
G&P folks: which of these visualizations are we currently using? I imagine
that some of the ones set up by Evan a while ago might not be in use
currently, but I'm sure others are regularly referred to.
Also, a question for Christian: I believe a lot of the current
visualizations are running off datasources in various parts of Evan's home
directory. How easy would it be to generate a list of the currently running
visualizations, and where they're pulling data from? This might help us
figure out which ones we still need, and move them to a better home if
necessary.
Thanks,
Jonathan
---------- Forwarded message ----------
From: Christian Aistleitner <christian(a)quelltextlich.at>
Date: Tue, Aug 27, 2013 at 7:20 AM
Subject: [Analytics] Dashboards/graphs/dashsources/... at gp.wmflabs.org
To: analytics(a)lists.wikimedia.org
Hi,
we are currently serving a few hundred graphs, dashboards, ... at
http://gp.wmflabs.org/
, but running the various scripts that generate them is a bit shaky
and their maintenance is eating up a considerable amount of time.
So in order to better use resources, and limit maintenance work, we're
curious about which parts, URLs, dashboards, graphs, datasources of
the site are actually in use by people in one way or the other.
If you rely on parts, URLs, dashboards, graphs, datasources of
http://gp.wmflabs.org/
please let us know by August 30.
Best regards,
Christian
P.S.: We may think about removing unused parts or stop even trying to
update them. So if you are using some parts, please do let us know :-)
P.P.S.: We already reached out to the users that we know of. So do not
feel pressed to reply again, if you have already replied to the
private email about this issue.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
--
Jonathan T. Morgan
Learning Strategist
Wikimedia Foundation
Heya,
NuPIC seems to be quite a promising library to build machine learning
applications and they will be hosting a hackathon in November in SF.
Thought some people might be interested about this!
D
---------- Forwarded message ----------
From: Matthew Taylor <matt(a)numenta.org>
Date: Tue, Aug 27, 2013 at 12:45 PM
Subject: [nupic-dev] Fall 2013 NuPIC Hackathon
To: "NuPIC general mailing list." <nupic(a)lists.numenta.org>
Hackers, start your engines!
http://numenta.org/events.html#november_2013_hackathon
Nov 2-3, at the Tagged offices in San Francisco.
RSVP here: http://www.meetup.com/numenta/events/136809782/
I'll be adding more details about the NLP focus as we set them up.
---------
Matt Taylor
OS Community Flag-Bearer
Numenta
_______________________________________________
nupic mailing list
nupic(a)lists.numenta.org
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
Whist investigating an orthogonal logging issue I encountered a couple of
differences between squid/varnish which I didn't know about:
* Varnish does not give subsecond request time information
* Varnish does give subsecond request processing time info
* Varnish calls it 'hit/200' or 'miss/302' instead of 'TCP_MEM_HIT/200' or
'TCP_MISS/302'
* Varnish does not URL encode the user agent field
Example log lines:
amssq41.esams.wikimedia.org 1013692039 2013-07-31T23:00:02.331 0
XXX TCP_MEM_HIT/200 614 GET http://meta.wikimedia.org/XXX NONE/- image/png
http://en.wikipedia.org/XXX - Mozilla/5.0%20(Wind... en-US en;q=0.8 -
cp1006.eqiad.wmnet 1442176851 2013-07-31T23:00:02.452 0 XXX TCP_MISS/302 406
GET http://meta.wikimedia.org/XXX NONE/- - http://en.m.wikipedia.org/XXX -
Mozilla/5.0%20(i.... en-us -
cp3012.esams.wikimedia.org 823553992 2013-07-31T23:00:02 0.000119448 XXX
hit/200 20 GET http://meta.m.wikimedia.org/XXX - image/png
http://de.m.wikipedia.org/XXX XXX Mozilla/5.0 (iPho... de-de -
~Matt Walker
Wikimedia Foundation
Fundraising Technology Team
Hi,
we are currently bringing the device property, and platform
computations back to life outside of Hadoop. Data for the last few
days has been computed and the jobs are running.
However, I am not sure about the old data that we have. Should we
blend that in?
* For device properties, I found that
http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props
seem to contain property data for 2013-03-01 until 2013-05-15.
Since this data stopped already in mid-May, I assume we have more data
to blend in (end of May, June, July) at a different place.
Do we have such data?
Do we know if the above data is good or it's just a relict from test runs?
* For platform data, I found that
http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_…
has platform data from 2013-04-14 until 2013-07-20 in
However, I am not sure which of this data is valid. Naive, uneducated
plausibility checks fail badly [1].
Do we know if/which data is good?
Do we have a better or other sources for the platform job?
Best regards,
Christian
[1] For example when only looking at the last few data points
for Android for example Tuesdays we get [2]:
2013-04-16: 6438000
2013-04-23: 6300000
2013-04-30: 6559000
2013-05-06: 7267000
2013-05-13: 6954000
2013-05-27: 33335000
2013-06-04: 14388000
2013-06-11: 8563000
2013-06-18: 10241000
2013-06-25: 6896000
2013-07-09: 3454000
2013-07-16: 7206000
The highest value (33M) is 10 times as high as the lowest (3M)—within
only three months.
Even when considering those data points outliers (and we have readings
that are even further out. Ranging from 1M–37M for Android), the
lowest data point is half the highest data point.
All on the same weekday!
This looks suspicious.
[2] There is no data for 2013-05-20, and 2013-07-02.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hello again!
Ok, we're actually going to do this this time. As far as we know, people who need access to private webrequest data have migrated their stuff over to stat1002.eqiad.wmnet. The private webrequest data that currently exists on stat1 will soon be deleted.
Soon is August 7th. That's in 1 week. We announced this back in May, so there should have been plenty of notice. If you are still using the webrequest logs in /a/squid/archive on stat1, find me on IRC (ottomata) or email me and we can work together to make sure you can continue to do your work on stat1002.
On Wednesday August 7th, we will be removing private webrequest logs from stat1.
Thanks all!
-Andrew Otto
On May 20, 2013, at 2:13 PM, Andrew Otto <otto(a)wikimedia.org> wrote:
> >> "Before that happens, you should make sure that any personal stuff on stat1 that you need for number crunching is copied over to stat1002. "
> > from your note it looks like this is only related to webrequest data, is that correct?
>
> Yup! That is correct. stat1002 will be primarily used as a sensitive private data host. Only those users that have personal unpuppetized code and cronjobs that use this data need to worry about moving them from stat1 to stat1002.
>
>
>
> > what are the criteria for deciding who has access to stat1002? I see that contractors like Aaron Halfaker or Jonathan Morgan currently don't have access to it.
>
> The criteria will be the same as before: RT request + manager approval. However, the request should only be made if the user actually needs access to the webrequest logs to do analysis. For example, if the main reason someone already has access to stat1 is so that they can access the research slave databases, then they won't need access to stat1002.
>
>
>
> > can you give us more information on the long-term plans/scope of stat1 vs stat1002 (and update https://office.wikimedia.org/wiki/Data_access as needed)?
>
> I've added a small bit about stat1002 on that page.
>
> I don't know much about a long term plan for stat1. It is hosted at the Tampa datacenter, and in the long term (yearish?) all the machines there will have be be decommissioned or relocated elsewhere. When it finally does move, it will most likely no longer have a public IP. stat1 is intended to be used as a workspace for analysts to do their thing on non-private data.
>
>
> -Ao
>