Steven, SJ, and Petr: I’ve provided responses to the questions about the quantitative findings below. Please let me know if any additional clarification would be helpful.
“*The report says "*On mobile, edit completion rate decreased by -24.3%
(-13.5pp)*" -- what's the difference between the first and (second) percentage figures?"
Great question, SJ. Can you please let me know if the below helps clarify the uncertainty you were asking about above?
The first percentage figure (-24.3%) indicates the relative change in percentage between the control and test groups. In other words, by what percentage (larger or smaller) did the edit completion rate observed in the test group change from the edit completion rate observed in the control group? We observed an edit completion rate of 55.6% in the control group and 42.1% in the test group. This equates to a 24.3% decrease, calculated by finding the ratio of the absolute change between the two groups (42.1% minus 55.6%) to the reference value (55.6%).
The second percentage figure (-13.5pp) represents the absolute change between the control and test groups. In this case, the difference is the test edit completion rate (42.1%) minus the control edit completion rate (55.6%), which equals -13.5 percentage points.
Both values are provided in the report to help clarify the degree of difference between the two numbers. But by either measure, these numbers indicate how much change we observed in edit completion rate between the test and control group.
“In other words we lose 24% of saved edits in order to decrease the
revert rate by 8.6%. This tradeoff does not seem good.”
The interaction between these two metrics is worthy of clarifying – thank you for drawing our collective attention to the need for us to do so, Steven.
Below is an attempt to offer some additional clarity. We'd value knowing if this brings any new questions to mind…
The 24% decrease observed on mobile represents the relative change in the edit completion rates observed for the control and test groups, as indicated in the clarification provided above. It does *not* reflect a decrease in the total number of saved edits.
If we look at the impact on saved edits, the total number of saved new content edits on mobile decreased from 3,924 edits in the control group to 3,468 in the test group (a total decrease of 456 saved new content edits or 12% relative decrease in saved new content edits). However, Reference Check increased the number of saved new content edits on mobile with a reference from 60 edits in the control group to 1012 edits in the test group (an increase of 952 saved new content edits or 16 times more saved new content edits with a reference). See Figure 18 of the analysis report for more details [1].
The edit completion rates for this analysis were based on a specific subset of all the edits that were attempted during the A/B test. Specifically, we reviewed the proportion of all edits where a person indicated intent to save and were successfully published. We focused only on edits where a person indicated intent to save as this is the point of the workflow when Reference Check would be shown and we wanted to exclude edits abandoned for other reasons before this point.
If we look at all edits that were started and then successfully published, there was no significant change in edit completion rate on mobile or desktop as Reference Check was presented to a limited number of all edits that were started.
Zooming out, we seem to be aligned in thinking that it will be important for us to actively monitor changes in edit completion rate to ensure future Edit Checks do not cause significant disruption to the editor experience. In fact, we'd value knowing if there are other metrics you think we should consider monitoring. Reason being: the Editing Team is actively defining the requirements for a dashboard (https://phabricator.wikimedia.org/T367130) that will help us track how edit session health evolves over time as more Checks are introduced.
[1] https://mneisler.quarto.pub/reference-check-ab-test-report-2024/#number-of-n...
On Thu, Jun 20, 2024 at 9:00 AM Megan Neisler mneisler@wikimedia.org wrote:
Steven, SJ, and Petr: I’ve provided responses to the questions about the quantitative findings below. Please let me know if any additional clarification would be helpful.
“*The report says "*On mobile, edit completion rate decreased by -24.3%
(-13.5pp)*" -- what's the difference between the first and (second) percentage figures?"
Great question, SJ. Can you please let me know if the below helps clarify the uncertainty you were asking about above?
The first percentage figure (-24.3%) indicates the relative change in percentage between the control and test groups. In other words, by what percentage (larger or smaller) did the edit completion rate observed in the test group change from the edit completion rate observed in the control group? We observed an edit completion rate of 55.6% in the control group and 42.1% in the test group. This equates to a 24.3% decrease, calculated by finding the ratio of the absolute change between the two groups (42.1% minus 55.6%) to the reference value (55.6%).
The second percentage figure (-13.5pp) represents the absolute change between the control and test groups. In this case, the difference is the test edit completion rate (42.1%) minus the control edit completion rate (55.6%), which equals -13.5 percentage points.
Both values are provided in the report to help clarify the degree of difference between the two numbers. But by either measure, these numbers indicate how much change we observed in edit completion rate between the test and control group.
“In other words we lose 24% of saved edits in order to decrease the
revert rate by 8.6%. This tradeoff does not seem good.”
The interaction between these two metrics is worthy of clarifying – thank you for drawing our collective attention to the need for us to do so, Steven.
Below is an attempt to offer some additional clarity. We'd value knowing if this brings any new questions to mind…
The 24% decrease observed on mobile represents the relative change in the edit completion rates observed for the control and test groups, as indicated in the clarification provided above. It does *not* reflect a decrease in the total number of saved edits.
If we look at the impact on saved edits, the total number of saved new content edits on mobile decreased from 3,924 edits in the control group to 3,468 in the test group (a total decrease of 456 saved new content edits or 12% relative decrease in saved new content edits). However, Reference Check increased the number of saved new content edits on mobile with a reference from 60 edits in the control group to 1012 edits in the test group (an increase of 952 saved new content edits or 16 times more saved new content edits with a reference). See Figure 18 of the analysis report for more details [1].
The edit completion rates for this analysis were based on a specific subset of all the edits that were attempted during the A/B test. Specifically, we reviewed the proportion of all edits where a person indicated intent to save and were successfully published. We focused only on edits where a person indicated intent to save as this is the point of the workflow when Reference Check would be shown and we wanted to exclude edits abandoned for other reasons before this point.
If we look at all edits that were started and then successfully published, there was no significant change in edit completion rate on mobile or desktop as Reference Check was presented to a limited number of all edits that were started.
Zooming out, we seem to be aligned in thinking that it will be important for us to actively monitor changes in edit completion rate to ensure future Edit Checks do not cause significant disruption to the editor experience. In fact, we'd value knowing if there are other metrics you think we should consider monitoring. Reason being: the Editing Team is actively defining the requirements for a dashboard ( https://phabricator.wikimedia.org/T367130) that will help us track how edit session health evolves over time as more Checks are introduced.
Thanks for the followup explanation on this Megan and other folks from the team. Your explanation makes a lot of sense and it's much less concerning now.
[1] https://mneisler.quarto.pub/reference-check-ab-test-report-2024/#number-of-n... _______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/... To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org
wikimedia-l@lists.wikimedia.org