Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
Hi friends!
Spark 1.x is pretty old. We only keep it around because it is a standard
part of the Cloudera distribution we use in the analytics Hadoop cluster.
The Analytics Engineering team uses Spark 2 for all of our jobs, and you
should too!
Spark 2 has been available in our cluster for over a year now. If you
don't yet use it, see
https://wikitech.wikimedia.org/w/index.php?title=Analytics/Systems/Cluster/…
for more info on how to.
We'd like to remove Spark 1 during the week of February 11. Please migrate
any Spark 1 jobs to Spark 2 by then (if there are any left!). (If this
timeline doesn't work for you just let us know and we'll adjust.)
Thanks!
- Andrew Otto & Analytics Engineering
https://phabricator.wikimedia.org/T212134
Hi everyone,
I am wondering if the data on RSS/Atom feed subscribers to a given
Wikipedia page publicly available. If not, has Wikimedia publishes any
related statistics? Many thanks for any input!
Best,
Chenqi
Hi everybody,
analytics-store/dbstore1002 is currently experiencing issues, more info in
https://phabricator.wikimedia.org/T213670. The mysql daemon on the host
will likely experience downtime while we attempt to fix the issue,
apologies in advance for the trouble.
For any question feel free to reach out to me on the IRC analytics chan
(#wikimedia-analytics).
Luca (on behalf of the Analytics team and the Data persistence team)
Hi everybody,
as FYI the Eventlogging master database (on db1107) is currently down to
ease rack maintenance. More info in
https://phabricator.wikimedia.org/T213748
Recent data on the db1108/analytics-slave's log database will be delayed.
Let me know if this is an issue for you on IRC or via email :)
Luca (on behalf of the Analytics team)
Hello, everyone,
The next Research Showcase, *Understanding participation in Wikipedia*,
will be live-streamed next Wednesday, January 16, at 11:30 AM PST/19:30
UTC. This presentation is about new editors.
YouTube stream: https://www.youtube.com/watch?v=Fc51jE_KNTc
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentation:
*Understanding participation in Wikipedia: Studies on the relationship
between new editors’ motivations and activity*
By Martina Balestra, New York University
Peer production communities like Wikipedia often struggle to retain
contributors beyond their initial engagement. Theory suggests this may be
related to their levels of motivation, though prior studies either center
on contributors’ activity or use cross-sectional survey methods, and
overlook accompanied changes in motivation. In this talk, I will present a
series of studies aimed at filling this gap. We begin by looking at how
Wikipedia editors’ early motivations influence the activities that they
come to engage in, and how these motivations change over the first three
months of participation in Wikipedia. We then look at the relationship
between editing activity and intrinsic motivation specifically over time.
We find that new editors’ early motivations are predictive of their future
activity, but that these motivations tend to change with time. Moreover,
newcomers’ intrinsic motivation is reinforced by the amount of activity
they engage in over time: editors who had a high level of intrinsic
motivation entered a virtuous cycle where the more they edited the more
motivated they became, whereas those who initially had low intrinsic
motivation entered a vicious cycle. Our findings shed new light on the
importance of early experiences and reveal that the relationship between
motivation and activity is more complex than previously understood.
--
Janna Layton
Administrative Assistant - Audiences & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
A bug in the code that imports EventLogging data into Hive caused top 3
level EventCapsule <https://meta.wikimedia.org/wiki/Schema:EventCapsule>
fields to be set to NULL in all Hive EventLogging tables
since 2018-11-29T17:00:00. The affected fields were recvFrom, seqId, and
(more importantly) userAgent.
We've fixed the bug, and are backfilling the data now.
https://phabricator.wikimedia.org/T211833 has more info.
Sorry for the inconvenience! Follow the phabricator ticket to get updates
on when backfilling has completed.
-Andrew Otto
Systems Engineer, WMF
>
> Thanks for your replies! However, I have few more questions to ask for.
> Sorry!
No problem! :]
> You have mentioned that the analytics team is about to abandon the
> original WikiStats website. Is that means even if you successfully update
> the database (or pipelines), the data will still not be shown in WikiStats
> 1?
Correct, we are from now on giving only critical maintenance to WikiStats1.
So, even if we update the Analytics pipeline, no new data will be available
in WikiStats1.
Secondly, what I (and the community) need is just basic statistics (of
> course, EVERY categories included in WikiStat 1 will be much better).
Understand. I would suggest that, mid February, you check WikiStats2 (and
Analytics Query Service) for Chinese Wikiversity and determine what stats
are missing for you. Then you could make us know, and we would take that
into account when developing new features for WikiStats2.
Finally, if you finish the improvements, will the data dated before the
> improvements (ex. 2018-08) also be visible?
Yes, we should be able to calculate editing metrics since the beginning of
wiki-time.
Cheers!
On Fri, Jan 11, 2019 at 5:36 AM Eric Liu <ericliu.roc(a)gmail.com> wrote:
> Thanks for your replies! However, I have few more questions to ask for.
> Sorry!
>
> You have mentioned that the analytics team is about to abandon the
> original WikiStats website. Is that means even if you successfully update
> the database (or pipelines), the data will still not be shown in WikiStats
> 1?
>
> Secondly, what I (and the community) need is just basic statistics (of
> course, EVERY categories included in WikiStat 1 will be much better).
>
> Finally, if you finish the improvements, will the data dated before the
> improvements (ex. 2018-08) also be visible?
>
> Thanks for your help!
>
> Marcel Ruiz Forns <mforns(a)wikimedia.org>於 2019年1月11日 週五,02:36寫道:
>
>> If the analytics team add the data of Chinese Wikiversity into the
>>> database (base source), will WikiStats and WikiStats 2 both get updated? If
>>> not, then what can I do to fix it?
>>
>>
>> The data is already present in the initial wiki databases, but was not
>> being pulled by the Analytics pipeline that generates stats for WikiStats2.
>> Thanks to your heads-up, we already fixed that, see:
>> https://phabricator.wikimedia.org/T213290. However, it will only reflect
>> in WikiStats2 after the next round of data loading, which will take place
>> between the 5th and 10th of next month (Feb 2019).
>>
>> (Although WikiStats 2 has more advanced interface, it’s still in
>>> development (not stable enough), and the original WikiStats website has a
>>> much simpler interface to navigate and collect raw data. Yet, I prefer to
>>> use WikiStats 1 than WikiStats 2 as the reference for statistics.)
>>
>>
>> Yes, WikiStats2 is under development and will be for a while. We're
>> adding WikiStats1 functionalities to it as time allows. Unfortunately,
>> we're not actively working on WikiStats1 new features any more, only on
>> fixing critical errors. Now, if you're looking for raw data (as opposed of
>> data visualization), the Analytics API that I mentioned in my first reply
>> (Analytics Query Service) might have what you want (next month). Also, if
>> you want to tell us exactly what data are you looking for, we might be able
>> to help you get it; or in case we don't have it available yet, it will aid
>> us in determining which features should we add next to WikiStats2 in the
>> upcoming months.
>>
>> Cheers!
>>
>> On Thu, Jan 10, 2019 at 4:28 PM Eric Liu <ericliu.roc(a)gmail.com> wrote:
>>
>>> Thanks for your initial answer. One more question please. (Orz)
>>>
>>> If the analytics team add the data of Chinese Wikiversity into the
>>> database (base source), will WikiStats and WikiStats 2 both get updated? If
>>> not, then what can I do to fix it?
>>>
>>> (Although WikiStats 2 has more advanced interface, it’s still in
>>> development (not stable enough), and the original WikiStats website has a
>>> much simpler interface to navigate and collect raw data. Yet, I prefer to
>>> use WikiStats 1 than WikiStats 2 as the reference for statistics.)
>>>
>>> Again, thanks for your precious answer! It’s really helpful for both me
>>> and the Chinese Wikiversity community.
>>>
>>> Marcel Ruiz Forns <mforns(a)wikimedia.org>於 2019年1月10日 週四,23:09寫道:
>>>
>>>> [adding back analytics list to recipients]
>>>>
>>>> Hi Eric!
>>>>
>>>> Are WikiStats 1 and WikiStats 2’s database the same?
>>>>
>>>>
>>>> Although the initial source of data is the same for both WikiStats1 and
>>>> Wikistats2 (the wiki databases), WikiStats1 and WikiStats2 pull data from
>>>> different pipelines. WikiStats1 independently computes metrics monthly and
>>>> stores them in static html files, which then are served as
>>>> stats.wikimedia.org. WikiStats2 is a serving layer on top of the
>>>> Analytics data pipeline. It pulls data from Analytics Query Service[1], the
>>>> stats API maintained by us (Analytics team). It's a public service, so you
>>>> can query it freely. See manuals[2][3][4]. Note that in most cases data
>>>> from WikiStats1 matches data from WikiStats2, but some metrics can slightly
>>>> differ for technical reasons.
>>>>
>>>> And, is WikiScan a part of WikiStats?
>>>>
>>>>
>>>> No, I think WikiScan is a completely separate tool, though it probably
>>>> shares the same initial source of data than the WikiStats siblings.
>>>>
>>>> [1] https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS
>>>> [2] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
>>>> [3] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
>>>> [4] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Unique_Devices
>>>>
>>>> On Thu, Jan 10, 2019 at 10:34 AM Eric Liu <ericliu.roc(a)gmail.com>
>>>> wrote:
>>>>
>>>>> Are WikiStats 1 and WikiStats 2’s database the same? And, is WikiScan
>>>>> a part of WikiStats?
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>> Marcel Ruiz Forns <mforns(a)wikimedia.org>於 2019年1月9日 週三,23:34寫道:
>>>>>
>>>>>> [Adding Eric Liu to the recipient list, because he is not yet
>>>>>> subscribed to the list]
>>>>>>
>>>>>> Hi Eric!
>>>>>>
>>>>>> Thank you for the heads up. We will work on fixing that.
>>>>>> You can follow the progress of this task here:
>>>>>> https://phabricator.wikimedia.org/T213290
>>>>>>
>>>>>> BTW, please subscribe to the list here, so that you messages do not
>>>>>> get blocked for moderation.
>>>>>> Also, you will be able to receive all replies to your message. Thanks!
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Wed, Jan 9, 2019 at 4:26 PM Eric Liu <ericliu.roc(a)gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The Chinese Wikiversity project had been launched for several
>>>>>>> months, and it already has over 700 learning resources, surpassing Swedish
>>>>>>> Wikiversity and Korean Wikiversity, which shows that the project has a
>>>>>>> stable community.
>>>>>>>
>>>>>>> However, the WikiStats website hasn’t been updated yet, which makes
>>>>>>> the community difficult to track the data.
>>>>>>>
>>>>>>> Please add Chinese Wikiversity into the WikiStats database as soon
>>>>>>> as possible. We need, and will appreciate your help.
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Eric Liu (User:Ericliu1912) from Chinese Wikiversity
>>>>>>>
>>>>>> --
>>>>>>> 劉洺辰 敬上
>>>>>>> Sincerely, Eric Liu
>>>>>>> _______________________________________________
>>>>>>> Analytics mailing list
>>>>>>> Analytics(a)lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Marcel Ruiz Forns** (he/him)*
>>>>>> Analytics Developer @ Wikimedia Foundation
>>>>>>
>>>>> --
>>>>> 劉洺辰 敬上
>>>>> Sincerely, Eric Liu
>>>>>
>>>>
>>>>
>>>> --
>>>> *Marcel Ruiz Forns** (he/him)*
>>>> Analytics Developer @ Wikimedia Foundation
>>>>
>>> --
>>> 劉洺辰 敬上
>>> Sincerely, Eric Liu
>>>
>>
>>
>> --
>> *Marcel Ruiz Forns** (he/him)*
>> Analytics Developer @ Wikimedia Foundation
>>
> --
> 劉洺辰 敬上
> Sincerely, Eric Liu
>
--
*Marcel Ruiz Forns** (he/him)*
Analytics Developer @ Wikimedia Foundation
>
> If the analytics team add the data of Chinese Wikiversity into the
> database (base source), will WikiStats and WikiStats 2 both get updated? If
> not, then what can I do to fix it?
The data is already present in the initial wiki databases, but was not
being pulled by the Analytics pipeline that generates stats for WikiStats2.
Thanks to your heads-up, we already fixed that, see:
https://phabricator.wikimedia.org/T213290. However, it will only reflect in
WikiStats2 after the next round of data loading, which will take place
between the 5th and 10th of next month (Feb 2019).
(Although WikiStats 2 has more advanced interface, it’s still in
> development (not stable enough), and the original WikiStats website has a
> much simpler interface to navigate and collect raw data. Yet, I prefer to
> use WikiStats 1 than WikiStats 2 as the reference for statistics.)
Yes, WikiStats2 is under development and will be for a while. We're adding
WikiStats1 functionalities to it as time allows. Unfortunately, we're not
actively working on WikiStats1 new features any more, only on fixing
critical errors. Now, if you're looking for raw data (as opposed of data
visualization), the Analytics API that I mentioned in my first reply
(Analytics Query Service) might have what you want (next month). Also, if
you want to tell us exactly what data are you looking for, we might be able
to help you get it; or in case we don't have it available yet, it will aid
us in determining which features should we add next to WikiStats2 in the
upcoming months.
Cheers!
On Thu, Jan 10, 2019 at 4:28 PM Eric Liu <ericliu.roc(a)gmail.com> wrote:
> Thanks for your initial answer. One more question please. (Orz)
>
> If the analytics team add the data of Chinese Wikiversity into the
> database (base source), will WikiStats and WikiStats 2 both get updated? If
> not, then what can I do to fix it?
>
> (Although WikiStats 2 has more advanced interface, it’s still in
> development (not stable enough), and the original WikiStats website has a
> much simpler interface to navigate and collect raw data. Yet, I prefer to
> use WikiStats 1 than WikiStats 2 as the reference for statistics.)
>
> Again, thanks for your precious answer! It’s really helpful for both me
> and the Chinese Wikiversity community.
>
> Marcel Ruiz Forns <mforns(a)wikimedia.org>於 2019年1月10日 週四,23:09寫道:
>
>> [adding back analytics list to recipients]
>>
>> Hi Eric!
>>
>> Are WikiStats 1 and WikiStats 2’s database the same?
>>
>>
>> Although the initial source of data is the same for both WikiStats1 and
>> Wikistats2 (the wiki databases), WikiStats1 and WikiStats2 pull data from
>> different pipelines. WikiStats1 independently computes metrics monthly and
>> stores them in static html files, which then are served as
>> stats.wikimedia.org. WikiStats2 is a serving layer on top of the
>> Analytics data pipeline. It pulls data from Analytics Query Service[1], the
>> stats API maintained by us (Analytics team). It's a public service, so you
>> can query it freely. See manuals[2][3][4]. Note that in most cases data
>> from WikiStats1 matches data from WikiStats2, but some metrics can slightly
>> differ for technical reasons.
>>
>> And, is WikiScan a part of WikiStats?
>>
>>
>> No, I think WikiScan is a completely separate tool, though it probably
>> shares the same initial source of data than the WikiStats siblings.
>>
>> [1] https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS
>> [2] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
>> [3] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
>> [4] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Unique_Devices
>>
>> On Thu, Jan 10, 2019 at 10:34 AM Eric Liu <ericliu.roc(a)gmail.com> wrote:
>>
>>> Are WikiStats 1 and WikiStats 2’s database the same? And, is WikiScan a
>>> part of WikiStats?
>>>
>>> Thanks for your help!
>>>
>>> Marcel Ruiz Forns <mforns(a)wikimedia.org>於 2019年1月9日 週三,23:34寫道:
>>>
>>>> [Adding Eric Liu to the recipient list, because he is not yet
>>>> subscribed to the list]
>>>>
>>>> Hi Eric!
>>>>
>>>> Thank you for the heads up. We will work on fixing that.
>>>> You can follow the progress of this task here:
>>>> https://phabricator.wikimedia.org/T213290
>>>>
>>>> BTW, please subscribe to the list here, so that you messages do not get
>>>> blocked for moderation.
>>>> Also, you will be able to receive all replies to your message. Thanks!
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Jan 9, 2019 at 4:26 PM Eric Liu <ericliu.roc(a)gmail.com> wrote:
>>>>
>>>>> The Chinese Wikiversity project had been launched for several months,
>>>>> and it already has over 700 learning resources, surpassing Swedish
>>>>> Wikiversity and Korean Wikiversity, which shows that the project has a
>>>>> stable community.
>>>>>
>>>>> However, the WikiStats website hasn’t been updated yet, which makes
>>>>> the community difficult to track the data.
>>>>>
>>>>> Please add Chinese Wikiversity into the WikiStats database as soon as
>>>>> possible. We need, and will appreciate your help.
>>>>>
>>>>> Sincerely,
>>>>> Eric Liu (User:Ericliu1912) from Chinese Wikiversity
>>>>>
>>>> --
>>>>> 劉洺辰 敬上
>>>>> Sincerely, Eric Liu
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>
>>>>
>>>> --
>>>> *Marcel Ruiz Forns** (he/him)*
>>>> Analytics Developer @ Wikimedia Foundation
>>>>
>>> --
>>> 劉洺辰 敬上
>>> Sincerely, Eric Liu
>>>
>>
>>
>> --
>> *Marcel Ruiz Forns** (he/him)*
>> Analytics Developer @ Wikimedia Foundation
>>
> --
> 劉洺辰 敬上
> Sincerely, Eric Liu
>
--
*Marcel Ruiz Forns** (he/him)*
Analytics Developer @ Wikimedia Foundation
[adding back analytics list to recipients]
Hi Eric!
Are WikiStats 1 and WikiStats 2’s database the same?
Although the initial source of data is the same for both WikiStats1 and
Wikistats2 (the wiki databases), WikiStats1 and WikiStats2 pull data from
different pipelines. WikiStats1 independently computes metrics monthly and
stores them in static html files, which then are served as
stats.wikimedia.org. WikiStats2 is a serving layer on top of the Analytics
data pipeline. It pulls data from Analytics Query Service[1], the stats API
maintained by us (Analytics team). It's a public service, so you can query
it freely. See manuals[2][3][4]. Note that in most cases data from
WikiStats1 matches data from WikiStats2, but some metrics can slightly
differ for technical reasons.
And, is WikiScan a part of WikiStats?
No, I think WikiScan is a completely separate tool, though it probably
shares the same initial source of data than the WikiStats siblings.
[1] https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS
[2] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
[3] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
[4] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Unique_Devices
On Thu, Jan 10, 2019 at 10:34 AM Eric Liu <ericliu.roc(a)gmail.com> wrote:
> Are WikiStats 1 and WikiStats 2’s database the same? And, is WikiScan a
> part of WikiStats?
>
> Thanks for your help!
>
> Marcel Ruiz Forns <mforns(a)wikimedia.org>於 2019年1月9日 週三,23:34寫道:
>
>> [Adding Eric Liu to the recipient list, because he is not yet subscribed
>> to the list]
>>
>> Hi Eric!
>>
>> Thank you for the heads up. We will work on fixing that.
>> You can follow the progress of this task here:
>> https://phabricator.wikimedia.org/T213290
>>
>> BTW, please subscribe to the list here, so that you messages do not get
>> blocked for moderation.
>> Also, you will be able to receive all replies to your message. Thanks!
>>
>> Cheers
>>
>> On Wed, Jan 9, 2019 at 4:26 PM Eric Liu <ericliu.roc(a)gmail.com> wrote:
>>
>>> The Chinese Wikiversity project had been launched for several months,
>>> and it already has over 700 learning resources, surpassing Swedish
>>> Wikiversity and Korean Wikiversity, which shows that the project has a
>>> stable community.
>>>
>>> However, the WikiStats website hasn’t been updated yet, which makes the
>>> community difficult to track the data.
>>>
>>> Please add Chinese Wikiversity into the WikiStats database as soon as
>>> possible. We need, and will appreciate your help.
>>>
>>> Sincerely,
>>> Eric Liu (User:Ericliu1912) from Chinese Wikiversity
>>>
>> --
>>> 劉洺辰 敬上
>>> Sincerely, Eric Liu
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>> --
>> *Marcel Ruiz Forns** (he/him)*
>> Analytics Developer @ Wikimedia Foundation
>>
> --
> 劉洺辰 敬上
> Sincerely, Eric Liu
>
--
*Marcel Ruiz Forns** (he/him)*
Analytics Developer @ Wikimedia Foundation