Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
Hello everyone!
I'm a CS professor at Carleton College (formerly a PhD student at
Northwestern), and I've collaborated with folks at WMF on WP research in
the past. Most notably, I was the lead author on a paper written with
Jonathan Morgan and Jake Orlowitz evaluating the Wikipedia Adventure
<https://dl.acm.org/citation.cfm?id=2998307>. I hope to continue having
productive collaborations with other people who care about WP, and keep
producing research that supports the future of the project.
A potential research idea I'd like to explore is understanding whether and
how Wikipedia Zero impacted the amount and nature of participation on WP in
the projects that were affected by its rollout. Since the Wikipedia Zero
program lasted during a particular time period and then ended, it also sets
up a potentially good avenue for a comparative study.
I was wondering if any of you were aware of any datasets that log
information about the rollout of/participation in Wikipedia Zero.
Specifically, some of the data I'm interested in include:
- Countries that had access to WP Zero, including dates/times that this was
turned on and off
- Any information about whether access through WP Zero meant that you could
only visit/edit particular parts or language editions of Wikipedia (I don't
think this was the case, but I wanted to make sure)
- Edits made to any WP language edition by IP addresses from those
countries during that time period
- Whether those edits were being made using a device that accessed WP
through WP Zero
- The kind of device being used for editing
- Other generic features of the edit (number of characters, namespace,
registered/unregistered etc)
I'm aware that the data may not be recorded exactly along those lines, but
I was still curious to know what data about Wikipedia Zero is out there,
and whether or not it was publicly available.
Thank you for your help!
-Sneha
*Sneha Narayan*
Department of Computer Science
Carleton College
snehanarayan.com
I noticed just now that the Foundation is soliciting applications for a new CTO:
https://www.linkedin.com/feed/update/urn:li:activity:6515003866130505729
Can we please hire a CTO who would prefer to protect reader privacy
above the interests of any State or non-state actors, whether they
have infiltrated staff, contractor, and NDA signatory ranks, and
whether it interferes with reader statistics and analytics or not,
please?
In particular, I would like to repeat my request that we should not be logging
personally identifiable information which might increase our subpoena
burden or result in privacy violation incidents. Fuzzing geolocation
is okay, but we should not be transmitting IP addresses into logs
across even a LAN, for example, and we certainly shouldn't be
purchasing hardware with backdoor coprocessors wasting electricity and
exposing us to government or similar intrusions:
https://lists.wikimedia.org/pipermail/analytics/2017-January/005696.html
Best regards,
Jim
Hello everyone,
I'm just sending a reminder that the below Showcase will be starting in
half an hour.
-Janna Layton
Hi all,
The next Research Showcase, “Learning How to Correct a Knowledge Base
from the Edit History” and “TableNet: An Approach for Determining
Fine-grained Relations for Wikipedia Tables” will be live-streamed
this Wednesday, March 20, 2019, at 11:30 AM PST/18:30 UTC (Please note
the change in time in UTC due to daylight saving changes in the U.S.).
The first presentation is about using edit history to automatically
correct constraint violations in Wikidata, and the second is about
interlinking Wikipedia tables.
YouTube stream: https://www.youtube.com/watch?v=6p62PMhkVNM
As usual, you can join the conversation on IRC at #wikimedia-research.
You can also watch our past research showcases at
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase .
This month's presentations:
Learning How to Correct a Knowledge Base from the Edit History
By Thomas Pellissier Tanon (Télécom ParisTech), Camille Bourgaux (DI
ENS, CNRS, ENS, PSL Univ. & Inria), Fabian Suchanek (Télécom
ParisTech), WWW'19.
The curation of Wikidata (and other knowledge bases) is crucial to
keep the data consistent, to fight vandalism and to correct good faith
mistakes. However, manual curation of the data is costly. In this
work, we propose to take advantage of the edit history of the
knowledge base in order to learn how to correct constraint violations
automatically. Our method is based on rule mining, and uses the edits
that solved violations in the past to infer how to solve similar
violations in the present. For example, our system is able to learn
that the value of the [[d:Property:P21|sex or gender]] property
[[d:Q467|woman]] should be replaced by [[d:Q6581072|female]]. We
provide [https://tools.wmflabs.org/wikidata-game/distributed/#game=43
a Wikidata game] that suggests our corrections to the users in order
to improve Wikidata. Both the evaluation of our method on past
corrections, and the Wikidata game statistics show significant
improvements over baselines.
TableNet: An Approach for Determining Fine-grained Relations for
Wikipedia Tables
By Besnik Fetahu
Wikipedia tables represent an important resource, where information is
organized w.r.t table schemas consisting of columns. In turn each
column, may contain instance values that point to other Wikipedia
articles or primitive values (e.g. numbers, strings etc.). In this
work, we focus on the problem of interlinking Wikipedia tables for two
types of table relations: equivalent and subPartOf. Through such
relations, we can further harness semantically related information by
accessing related tables or facts therein. Determining the relation
type of a table pair is not trivial, as it is dependent on the
schemas, the values therein, and the semantic overlap of the cell
values in the corresponding tables. We propose TableNet, an approach
that constructs a knowledge graph of interlinked tables with subPartOf
and equivalent relations. TableNet consists of two main steps: (i) for
any source table we provide an efficient algorithm to find all
candidate related tables with high coverage, and (ii) a neural based
approach, which takes into account the table schemas, and the
corresponding table data, we determine with high accuracy the table
relation for a table pair. We perform an extensive experimental
evaluation on the entire Wikipedia with more than 3.2 million tables.
We show that with more than 88% we retain relevant candidate tables
pairs for alignment. Consequentially, with an accuracy of 90% we are
able to align tables with subPartOf or equivalent relations.
Comparisons with existing competitors show that TableNet has superior
performance in terms of coverage and alignment accuracy.
--
Janna Layton (she, her)
Administrative Assistant - Audiences & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
The next Research Showcase, “Learning How to Correct a Knowledge Base
from the Edit History” and “TableNet: An Approach for Determining
Fine-grained Relations for Wikipedia Tables” will be live-streamed
this Wednesday, March 20, 2019, at 11:30 AM PST/18:30 UTC (Please note
the change in time in UTC due to daylight saving changes in the U.S.).
The first presentation is about using edit history to automatically
correct constraint violations in Wikidata, and the second is about
interlinking Wikipedia tables.
YouTube stream: https://www.youtube.com/watch?v=6p62PMhkVNM
As usual, you can join the conversation on IRC at #wikimedia-research.
You can also watch our past research showcases at
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase .
This month's presentations:
Learning How to Correct a Knowledge Basefrom the Edit History
By Thomas Pellissier Tanon (Télécom ParisTech), Camille Bourgaux (DI
ENS, CNRS, ENS, PSL Univ. & Inria), Fabian Suchanek (Télécom
ParisTech), WWW'19.
The curation of Wikidata (and other knowledge bases) is crucial to
keep the data consistent, to fight vandalism and to correct good faith
mistakes. However, manual curation of the data is costly. In this
work, we propose to take advantage of the edit history of the
knowledge base in order to learn how to correct constraint violations
automatically. Our method is based on rule mining, and uses the edits
that solved violations in the past to infer how to solve similar
violations in the present. For example, our system is able to learn
that the value of the [[d:Property:P21|sex or gender]] property
[[d:Q467|woman]] should be replaced by [[d:Q6581072|female]]. We
provide [https://tools.wmflabs.org/wikidata-game/distributed/#game=43
a Wikidata game] that suggests our corrections to the users in order
to improve Wikidata. Both the evaluation of our method on past
corrections, and the Wikidata game statistics show significant
improvements over baselines.
TableNet: An Approach for Determining Fine-grained Relations for
Wikipedia Tables
By Besnik Fetahu
Wikipedia tables represent an important resource, where information is
organized w.r.t table schemas consisting of columns. In turn each
column, may contain instance values that point to other Wikipedia
articles or primitive values (e.g. numbers, strings etc.). In this
work, we focus on the problem of interlinking Wikipedia tables for two
types of table relations: equivalent and subPartOf. Through such
relations, we can further harness semantically related information by
accessing related tables or facts therein. Determining the relation
type of a table pair is not trivial, as it is dependent on the
schemas, the values therein, and the semantic overlap of the cell
values in the corresponding tables. We propose TableNet, an approach
that constructs a knowledge graph of interlinked tables with subPartOf
and equivalent relations. TableNet consists of two main steps: (i) for
any source table we provide an efficient algorithm to find all
candidate related tables with high coverage, and (ii) a neural based
approach, which takes into account the table schemas, and the
corresponding table data, we determine with high accuracy the table
relation for a table pair. We perform an extensive experimental
evaluation on the entire Wikipedia with more than 3.2 million tables.
We show that with more than 88% we retain relevant candidate tables
pairs for alignment. Consequentially, with an accuracy of 90% we are
able to align tables with subPartOf or equivalent relations.
Comparisons with existing competitors show that TableNet has superior
performance in terms of coverage and alignment accuracy.
Best,
Leila
Hi,
Thanks for the responses.
Is it best to use Talk:Quarry to request assistance or can someone provide
the SQL if I can't work it out?
I'll look at HostBot's query first but that might help so thanks Jmorgan
Can you please cc me on any reply as I'm not part of the list?
Thanks,
RhinosF1
Hi,
Could someone advise how I could do Quarry queries on user activity?
I'd like to monitor the following on en-wiki:
All for unblocked humans
-Number of active, confirmed editors (Done,
https://quarry.wmflabs.org/query/34268)
-Number of active editors with only 1 edit
-Number of active editors with only 1 mainspace edit
-Number of active non-conformirmed Editors
Definitions:
Active : 1 edit in last 30 days
Human : No bot flag
Confirmed : 4 days / 10 edits
Thank you so much Nuria for your clarifications !
All the best,
Viviana
________________________________
Da: Analytics <analytics-bounces(a)lists.wikimedia.org> per conto di analytics-request(a)lists.wikimedia.org <analytics-request(a)lists.wikimedia.org>
Inviato: lunedì 11 marzo 2019 17:53
A: analytics(a)lists.wikimedia.org
Oggetto: Analytics Digest, Vol 85, Issue 5
Send Analytics mailing list submissions to
analytics(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/analytics
or, via email, send a message with subject or body 'help' to
analytics-request(a)lists.wikimedia.org
You can reach the person managing the list at
analytics-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Analytics digest..."
Today's Topics:
1. Re: R: Analytics Digest, Vol 85, Issue 3 (Nuria Ruiz)
----------------------------------------------------------------------
Message: 1
Date: Mon, 11 Mar 2019 09:53:32 -0700
From: Nuria Ruiz <nuria(a)wikimedia.org>
To: "A mailing list for the Analytics Team at WMF and everybody who
has an interest in Wikipedia and analytics."
<analytics(a)lists.wikimedia.org>
Subject: Re: [Analytics] R: Analytics Digest, Vol 85, Issue 3
Message-ID:
<CAMpYYkF=r=EWPivqkbVoOS+4i0L6WYi3BviKBaAoZ7C4iWUiAA(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
>but I'd like to follow the behaviour flow when an user access some
Wikipedia page following a link by my website.
>I don't know if that is possible somehow and if it makes sense for you.
I see, It does make sense but that is not data we have.
Thanks,
Nuria
On Sat, Mar 9, 2019 at 5:19 AM viviana paga <viviana.paga(a)hotmail.it> wrote:
> Hi Nuria,
> thanks for your reply and tips!
> As you propose, I use matomo to get data from my client, but I'd like to
> follow the behaviour flow when an user access some Wikipedia page following
> a link by my website.
> I don't know if that is possible somehow and if it makes sense for you.
> Many thanks,
> Viviana
>
>
> ------------------------------
> *Da:* Analytics <analytics-bounces(a)lists.wikimedia.org> per conto di
> analytics-request(a)lists.wikimedia.org <
> analytics-request(a)lists.wikimedia.org>
> *Inviato:* venerdì 8 marzo 2019 17:00
> *A:* analytics(a)lists.wikimedia.org
> *Oggetto:* Analytics Digest, Vol 85, Issue 3
>
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
>
> Today's Topics:
>
> 1. R: Analytics Digest, Vol 85, Issue 2 (viviana paga)
> 2. Re: R: Analytics Digest, Vol 85, Issue 2 (Nuria Ruiz)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 8 Mar 2019 13:21:34 +0000
> From: viviana paga <viviana.paga(a)hotmail.it>
> To: "analytics(a)lists.wikimedia.org" <analytics(a)lists.wikimedia.org>
> Subject: [Analytics] R: Analytics Digest, Vol 85, Issue 2
> Message-ID:
> <
> PR1PR06MB4698D7F323AE49A1CC429C23E44D0(a)PR1PR06MB4698.eurprd06.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Dan,
> thanks for your reply !
> I agree with you and in fact I do that in my front-end, but I think that
> it would be interesting have same general stats from Wikimedia too; in
> particular to understand which impact could my project have on general
> Wikimedia stats and which will be the behaviour of the users arriving to
> Wikimedia from my site (if the attended one or not).
> I thought having some stats by api-user-agent from backend could help me
> to understand these points and improve in the future my project in the best
> way. What do you think ? Is there a procedure that can I follow to have
> these stats?
> Many thanks,
> Viviana
>
> ________________________________
> Da: Analytics <analytics-bounces(a)lists.wikimedia.org> per conto di
> analytics-request(a)lists.wikimedia.org <
> analytics-request(a)lists.wikimedia.org>
> Inviato: venerdì 8 marzo 2019 13:00
> A: analytics(a)lists.wikimedia.org
> Oggetto: Analytics Digest, Vol 85, Issue 2
>
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
>
> Today's Topics:
>
> 1. Stats of mediawiki API / Access to non-public data (viviana paga)
> 2. Re: Stats of mediawiki API / Access to non-public data
> (Dan Andreescu)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 7 Mar 2019 14:15:27 +0000
> From: viviana paga <viviana.paga(a)hotmail.it>
> To: "analytics(a)lists.wikimedia.org" <analytics(a)lists.wikimedia.org>
> Subject: [Analytics] Stats of mediawiki API / Access to non-public
> data
> Message-ID:
> <
> PR1PR06MB4698B179051D012C970A8928E44C0(a)PR1PR06MB4698.eurprd06.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> I’m working on a project about the sharing of the cultural heritage and,
> more in general, about the sharing of the open knowledges.
> In particular, I'm developing a webservice that use the Mediawiki API and
> I'd like to have some stats about the traffic of my api calls to the
> commons.wikipedia.org domain.
>
> More specifically, I'd like to have:
> - the number of GET Requests by Api-User-Agent
> - the number of views/edit by Api-User-Agent
> - the stats of the Wikipedia traffic from inbound links by a specif domain
> or url
>
> Is this possible somehow to access to these limited non-public data?
> Is there a procedure that I can follow?
>
> The project is still in development, but next April we will release a beta
> version for a limited range of users-testers.
> The project is completely non-profit and it would provide maximum freedom,
> independence and privacy for its users.
> That’s why, I’d like to have from backend some stats by api-user-agent:
> that would guarantees the total privacy of the user, and, at the same time,
> the project could have some general stats about the traffic, the
> utilisation and its impact on the general Wikimedia stats.
>
> If someone among you is interested in these issues (open-shared-cultural
> heritage, open linked data), I’d like to keep in touch and, even, to
> propose to partecipate as tester in April.
>
> Thank you in advance,
> Kind regards,
> Viviana Paga
> https://www.linkedin.com/in/viviana-paga-42bb8b44/
>
>