Hi,
just a quick heads up, that Ops are about to add a “php” key to the
X-Analytics header (i.e.: for sampled-1000 logs, hive, ...):
https://gerrit.wikimedia.org/r/#/c/156793/
This header will hold the used PHP implementation [1].
Planned deployment is between 2014-09-01 and 2014-09-02.
Have fun,
Christian
[1] https://wikitech.wikimedia.org/wiki/X-Analytics#Keys
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
just a quick heads up that due to database issues, geowiki currently
cannot update daily with new data.
So pages with daily active editor counts like
http://gp.wmflabs.org/graphs/active_editors_totalhttp://gp.wmflabs.org/graphs/enwiki_editor_countshttp://gp.wmflabs.org/graphs/frwiki_editor_countshttp://gp.wmflabs.org/graphs/eowiki_editor_counts
[...]
and the private per country breakdowns at
https://stats.wikimedia.org/geowiki-private/
will not see updates until the issue is resolved.
Older data is not affected by the issue. So data up to May 1st is good
to use (with the usual geowiki caveats).
Best regards,
Christian
P.S.: The root issue is not severe, and I guess it can be fixed in the
next couple of days.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi,
people from gerrit's “Analytics” group [1] currently hold
* Push (including Force Push)
* Push Merge Commit
* Forge Author Identiy
* Forge Committer Identity
permissions on “analytics/*” projects in gerrit. But those permissions
got and get in the way one way or the other.
Do we need those permissions for our repos?
If no one objects, I'll start removing them on 2014-04-28.
Best regards,
Christian
[1] https://gerrit.wikimedia.org/r/#/admin/groups/uuid-d34747bee94be39cff54b5fd…
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi all!
For a while now, we’ve been hosting some public datasets at http://stat1001.wikimedia.org/public-datasets. We wanted to dissociate the domain that these datasets were hosted at from the actual server name, so, we did! The same data is now available at http://datasets.wikimedia.org. Redirects from stat1001.wikimedia.org are in place.
Let us know if you have any trouble.
Thanks!
-Ao
Hi,
the analytics dev team has committed to the following user stories for the
sprint starting today, ending September 2.
Bug ID
Component
Summary
Points
69297
Wikimetrics
Story: EEVS user does not see reports for projects without databases
3
68351
EEVS
Story: AnalyticsEng has website for EEVS
34
67806
EEVS
Story: EEVSUser loads static site in accordance to Pau's design
13
That’s 50 points in 3 Stories
You can see the sprint here:
http://sb.wmflabs.org/t/analytics-developers/2014-08-21/
Note:
Bug 68507 (replication lag may affect recurrent reports) is carried over
from the previous sprint and will be completed shortly.
Cheers,
Kevin Leduc
Hello!
I've been working for the last few days on
https://github.com/Ironholds/WPDMZ, which currently generates raw data
on 'number of non-bot edits per country', and I'd like to run some
stats / make some graphs based on it. Since I'd like al l my
'research' to be completely repeatable, I'd love it if we can make the
'raw data' (edits per country) publicly available on labsdb. I have
most of the code written for it, *but* it needs anonymization.
The biggest de-anonymization threats involve identifying which editors
come from which countries, and can be executed in the following case:
An editor is the only person editing from a country in a project where
the country has low edit volume, and by a process of elimination /
counting edits from a public source (like recentchanges), the
individual editor can be connected to a particular country
I propose the following Anonymization scheme:
1. No data for projects with less than a threshold of total
*individual editors* in the time period for which the data is
released.
2. For countries that have less than a threshold % of 'individual
editors' in the time period, we just simply lump them in as 'other'.
This removes most anonymization attacks I can think of. Thoughts? I
can easily write up the code to generate these on a monthly basis and
puppetize those to make the data publicly available. I think not just
me, but lots of external researchers would benefit from such data.
Thanks!
--
Yuvi Panda T
http://yuvi.in/blog
I first pitched this idea to Aaron Halfaker in July, but nothing has
happened so far, so I wanted to pitch it to the whole analytics team....
The Foundation has been discussing the gender gap and how to address it
since I started 4 years ago. Often there is discussion of how particular
features or projects might theoretically impact the gender gap: the
Education Program, Visual Editor, WikiLove, editathons, etc. Unfortunately,
we have absolutely no idea if any of these things have any impact. Nor do
we have any idea if the gender gap is getting better or worse or staying
the same. All we have is a handful of non-comparable data points based on
surveys with different methodologies.
The main obstacle to generating useful gender gap data has always been that
we don't have reliable absolute numbers because editors do not reliably
indicate their gender in the preferences. There is nothing stopping us,
however, from analysing *relative* trends using existing data. For example,
we could generate graphs showing the relative difference per month in edits
by men and women and this data would be unaffected by the unreliability of
the absolute numbers (since we would only be looking at changes in the
percentages).
This is possible right now with existing data and shouldn't be very hard to
generate (although the queries will be expensive). To see a full
explanation of the idea, please check out the Trello card and add comments
there:
https://trello.com/c/vLkEILa6/369-gender-edit-dashboard
Ryan Kaldari
Hi,
due to gadolinium and its webstatscollector process having had load
issues (bug 70118 [1]. Ottomata fixed the root cause already), the
webstatscollector files at
https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-08/
for the hours between 2014-08-24 14:00 and 2014-08-27 21:00 might
exhibit a higher loss than usual.
The files up to 2014-08-24 14:00 are ok.
The files from 2014-08-27 21:00 onwards are ok.
But the files in between still need closer examination.
We'll track progress on that on bug 70118 [1].
If you consume webstatscollector files directly, or indirectly
(stats.grok.se, wikistats, ...) please be aware of data issues for
that period.
Sorry for the inconveniences,
Christian
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=70118
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------