Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
[apologies for cross-posting]
In a nutshell:
We are asking for your input to help us learn how to release the
historical edit data of Wikimedia projects in a more efficient way.
Please provide your feedback via
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3E…
by 2019-09-03.
******
Dear researchers and analytics users,
The Analytics team at Wikimedia Foundation [1] has been working on
building a data lake [2] for Wikimedia edits [3] to enable the
research and analysis of Wikimedia's edit data in a more efficient
way. This data is a history of activity on Wikimedia projects as
complete and research-friendly as possible. Edits have context, such
as whether they were reverted, in the same line as the edit itself. So
you can focus more on what you want to find out instead of writing
code to wrestle the data. Each line of the data released will include
the following and more (see full specification [3a], [3b], [3c]):
* editor edit count, groups, blocks, bot status, name, current and
historical (time of edit)
* seconds since this editor's last edit
* page context, current and historical (namespace, seconds since last
revision, etc.)
* seconds to identity revert or deletion, if applicable
* revision tags (mobile edit, ve edit, etc.)
The first instance of this data will be released in the coming months
and to make this release as useful as possible for you all, the users
of the data, the team needs to hear your thoughts on how to slice and
dice the data at publishing time. You can provide your input at
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3E…
.
Please provide your input to this survey no later than 2019-09-03.
Best,
Leila
[1] https://wikitech.wikimedia.org/wiki/Analytics
[2] https://en.wikipedia.org/wiki/Data_lake
[3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
a) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_his…
b) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_use…
c) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_pag…
--
Leila Zia
Principal Research Scientist, Head of Research
Wikimedia Foundation
Hello!
Sorry to disturb you. I looked at stats.wikimedia.org and some dates seems
interesting to me. This site shows that (
https://stats.wikimedia.org/v2/#/az.wikipedia.org/reading/page-views-by-cou…
) last month (in August) pages in Azerbaijani Wikipedia had about 3 million
view from Netherlands. These statistics are about same during last months.
Even if (
https://stats.wikimedia.org/v2/#/az.wikipedia.org/reading/page-views-by-cou…
) in july 2018, views from Netherlands is more than Azerbaijan. When we
compare Netherland and other country, it seems something is wrong. Because
not many Azerbaijani people live in Netherland, and I dont believe local
people read Azerbaijani Wikipedia ))
Thank you for your attention.
Kind regards,
User:Eminn
Emin Allahverdi
Team:
Within the course of the day today we will be updating turnilo and
superset, superset version does not have major differences so hopefully you
do not notice any issues. Turnilo however includes several new features,
among them the ability to create heatmaps, the new functionality is still
not perfect but quite useful.
See, for example, self-identified bot requests (bots that tell us they are
bots) per country per hour for yesterday:
[image: Screen Shot 2019-08-21 at 4.38.13 PM.png]
Thanks,
Nuria
Hi everybody,
as part of https://phabricator.wikimedia.org/T201165 the Analytics team
thought to reach out to everybody to make it clear that all the home
directories on the stat/notebook nodes are not backed up periodically. They
run on a software RAID configuration spanning multiple disks of course, so
we are resilient on a disk failure, but even if unlikely if might happen
that a host could loose all its data. Please keep this in mind when working
on important projects and/or handling important data that you care about.
I just added a warning to
https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients.
If you have really important data that is too big to backup, keep in mind
that you can use your home directory (/user/your-username) on HDFS (that
replicates data three times across multiple nodes).
Please let us know if you have comments/suggestions/etc.. in the
aforementioned task.
Thanks in advance!
Luca (on behalf of the Analytics team)
Hi WMF Analytics,
In my web searches in the past few months I am seeing an increasing number
of websites that have republished Wikimedia content, sometimes in ways that
I suspect are in violation of trademark and/or Creative Commons licensing
rules. (My guess is that these sites make money through advertising that
they place on their sites.) Has WMF observed any negative effects in web
traffic that can be attributed to other websites reusing Wikimedia content
and/or trademarks?
It might be interesting if WMF can obtain statistics from web search
providers regarding how many times users click on search engine links to
sites that reuse Wikimedia content and/or trademarks.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )