typo: the English Wikipedia mean activation rate for these 8 days is 22.7%, not 27.2%

On May 10, 2013, at 3:37 PM, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:

I put together a proof of concept showing how to use the UserMetrics API to generate project-level engagement metrics dashboards. Think of it as the equivalent of the TAE dashboard from the WMF reportcard, but: 

(1) obtained in (virtual) real time via the UserMetrics API
(2) using an arbitrary metric among those available in the API.

http://toolserver.org/~dartar/dashboards/metrics/threshold/
(Note: the toolserver has the hiccups again, I attach below a screenshot if you can't load this URL) 

How do I read these graphs?

The first graph represents a project's overall new user activation rate, or more precisely the proportion of daily registered users completing at least 1 article-namespace edit within 24 hours of registration. The second graph represents the absolute number of new users hitting the 1st edit in 24h threshold each day.

We extensively use the 1-edit/ns0/24 threshold metric in E3 as an indicator of new editor engagement at experiment-level (typically for cohorts of a few thousands users), but the same metric can also be calculated for the whole population of new users registered in a Wikimedia project in a given timespan.

What story do these graphs tell?

New users on the German or Dutch Wikipedia have a much higher activation rate (28.7 -28.9%) compared to the English Wikipedia (27.2%, i.e. a pretty large 6% average difference) in this 8 day period. Conversely, other projects (like zhwiki) have a much lower new user activation rate (11.4%).

In absolute terms (second graph), the Spanish Wikipedia outperforms the German Wikipedia, adding every day twice as many 1-edit threshold hitters than the Germans, despite having only half its new user activation rate.

How were these graphs generated?

You heard that UserMetrics can compute cohort metrics. In fact, there's a magic cohort named "all" which computes a metric for all new registered users in a specific period. 
The data for these dashboards was generated for each of the top Wikipedias using requests like the following:

https://metrics.wikimedia.org/cohorts/all/threshold?time_series&aggregator=proportion&project=enwiki&start=20130501000000&end=20130509000000&slice=24&group=REGISTRATION

The query took less than a minute to compute the threshold metric for about 31,500 users registered on enwiki in 8 days.

Are these dashboards refreshed daily?

No, this is a static proof-of-concept, but it's trivial to set up scripts to refresh this data.

What other project-level dashboards can I generate?

Try new metrics:
instead of the activation rate you can visualize the 24-hour revert rate for all new registered users, day by day

Change time slices:
maybe you want to extract activation rates by hour, or by week or by month. You can do this by manipulating the time slice size

Change metric-specific parameters: 
instead of the 1-edit threshold, you can visualize the proportion of 24-hour 5+ threshold hitters
you can limit the metric to the Article and User namespaces, or any arbitrary namespace
you can measure threshold hitters in 48 hours or 7 days since registration
Change aggregator
we build daily aggregates by measuring a method that returns the proportion of threshold hitters, but you can use any of the available aggregator functions, for example visualize the mean or the median 24-hour revert rate

Change the group-by method
this example groups users by registration date and calculates the proportion of 24-hours hitters for these daily groups. You can also group by activity and look at how many users, regardless of when they registered, reach the threshold (or any other metric) within each 24-hour slice

Are you suggesting we should replace TAE as a key metric?

No, I am suggesting that we can now easily complement that single, monolithic metric with more granular metrics that can help us gauge how different projects perform at engaging and retaining new users.

Known issues

Computing project-level metrics consumes a lot of resources, the usage of the "all" magic keyword will need to be restricted (so that regular API users cannot trigger it). Stability issues for these computationally intensive requests also need to be addressed.

Hope this gives you a sense of the data that can be generated by the API, let me know if you have any questions.

Dario


<activation_rates.png>