Hey there!
I just wrote a script that fetches data from the AQS new pages endpoint
<https://wikimedia.org/api/rest_v1/#!/Edited_pages_data/get_metrics_edited_p…>
in order to prepare the our monthly health metrics (T199459
<https://phabricator.wikimedia.org/T199459>).
However, it seems like that endpoint doesn't yet have monthly data for
September. For example, a query for Commons with a start of July 1 and and
an end of October 1
<https://wikimedia.org/api/rest_v1/metrics/edited-pages/new/commons.wikimedi…>
returns only data for July and August. What's the schedule for updating
this data?
To be honest, I feel pretty frustrated by this. Wikistats 1 generates data
on content pages with a delay of 10-15 days after the end of the month,
which has made it difficult for us to provide timely metrics to executives
and the board. I had assumed (to a degree that I didn't even check) that by
switching to this API, we would instead only have to deal with the delay in
generating the mediawiki_history snapshot (5-7 days after the end of the
month). But that doesn't seem to be the case.
--
Neil Patel Quinn <https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF>
(he/him/his)
product analyst, Wikimedia Foundation
Hi everybody,
the Analytics team is going to move the Oozie and Hive daemons from the
analytics1003 host to an-coord1001 (new host, hardware refresh) on Tuesday
Oct 9th at 10 AM CEST. This will require downtime for Oozie and Hive, so
some jobs might fail or not work at all during the maintenance. We have
allocated two hours for this procedure but it should require less time.
Tracking task: T205509
As always, please follow up with me or anybody in the analytics team for
clarifications and/or comments (via Phabricator or IRC Freenode
#wikimedia-analytics).
Thanks for the patience!
Luca (on behalf of the Analytics team)
Dear Sir,
I thank you for your answer. I saw what you are doing concerning the Metrics Kit. I propose that you create a tool in which you apply Burst Detection Techniques on author co-occurrence in the talk pages of wikis. In fact, if two users are writing in the same pages within a very short period of time, there is a significant probability that they are in edit war, mutual harassment or discussing an absolutely interesting issue. If you are interested in the idea, you are free to develop it using simple coding, machine learning and APIs and publish it in a conference paper. However, you should just involve our names in the list of the co-authors as we are the creators of the original idea. We are Houcemeddine Turki (Faculty of Medicine of Sfax, University of Sfax, Sfax, Tunisia) and Seyed Mohammad Jafar Jalali (Institute for Intelligent Systems Research and Innovation, Deakin University, Melbourne, Australia). If you need a further development of this idea, feel free to contact us and we will answer your questions.
Yours Sincerely,
Houcemeddine Turki
Envoyé depuis mon appareil Samsung
-------- Message d'origine --------
De : Joe Sutherland <jsutherland(a)wikimedia.org>
Date : 05/10/2018 22:29 (GMT+01:00)
À : "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." <analytics(a)lists.wikimedia.org>
Objet : [Analytics] Community health metrics kit: Input needed!
Hello everyone - apologies for cross-posting! TL;DR: We would like your feedback on our Metrics Kit project. Please have a look and comment on Meta-Wiki: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the Community Health Initiative, is working on a Metrics Kit designed to measure the relative "health"[1] of various communities that make up the Wikimedia movement: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking at various aspects of Wikimedia project communities. This could be used by both community members to make decisions on their community direction and Wikimedia Foundation staff to point anti-harassment tool development in the right direction.
We have a set of metrics we are thinking about including in the kit, ranging from the ratio of active users to active administrators, administrator confidence levels, and off-wiki factors such as freedom to participate. It's ambitious, and our methods of collecting such data will vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <https://stats.wikimedia.org/v2/#/all-projects> — how might these tools coexist?
Your opinions will help to guide this project going forward. We'll be reaching out at different stages of this project, so if you're interested in direct messaging going forward, please feel free to indicate your interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this list!
[1] What do we mean by "health"? There is no standard definition of what makes a Wikimedia community "healthy", but there are many indicators that highlight where a wiki is doing well, and where it could improve. This project aims to provide a variety of useful data points that will inform community decisions that will benefit from objective data.
--
Joe Sutherland (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
<http://joesutherland.rocks>
Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)