The Community Tech team is trying to find out stats about edit conflicts. It looks like there was a patch merged back in January to collect stats on this (https://gerrit.wikimedia.org/r/#/c/266760/2/includes/EditPage.php) but I can't figure out where this is actually collecting the stats at. It looks like it's using the BufferingStatsdDataFactory class, but I couldn't find any documentation on what this is or where it actually collects the stats and I don't have time at the moment to investigate more deeply.
Question: Does anyone know where those stats are actually collected?
Request: Could someone create some documentation on MediaWiki.org for stats collection and retrieval via BufferingStatsdDataFactory?
Hi Ryan,
I don't own this - nor does Analytics team collect this data - but looking at the fact that it's being collected by statsd - It goes into graphite. Our graphite is at https://graphite.wikimedia.org/ (log in with ldap creds) and you can find this metric in the tree as "MediaWiki.edit.failures.conflict". Hope that helps.
Best,
On Wed, Feb 10, 2016 at 10:00 AM, Ryan Kaldari rkaldari@wikimedia.org wrote:
The Community Tech team is trying to find out stats about edit conflicts. It looks like there was a patch merged back in January to collect stats on this (https://gerrit.wikimedia.org/r/#/c/266760/2/includes/EditPage.php) but I can't figure out where this is actually collecting the stats at. It looks like it's using the BufferingStatsdDataFactory class, but I couldn't find any documentation on what this is or where it actually collects the stats and I don't have time at the moment to investigate more deeply.
Question: Does anyone know where those stats are actually collected?
Request: Could someone create some documentation on MediaWiki.org for stats collection and retrieval via BufferingStatsdDataFactory?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Part of the Visual Editor / Wikitext comparison dashboard is to analyze the failure rates and types. This is instrumented as part of the Schema:Edit and available in EventLogging data under Edit_<<schema-revision>>. To see it visually, the dashboard is a bit clunky right now without bookmarks, but here's what you do:
* go to https://edit-analysis.wmflabs.org/compare/ * scroll down to *Failure Rates by Type* * click on the BOTH button in the upper right, or just the WIKITEXT button if that's all you care about * de-select everything in the graph except the "conflict" reason
What you're looking at now is the percent of edits that ended in an edit conflict since last April. (The huge amount of data is why the dashboard is taking forever to load, I gotta check on that with Neal).
I'm not sure about this BufferingStatsdDataFactory thing, I'd love if someone more familiar explained more.
On Wed, Feb 10, 2016 at 1:00 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
The Community Tech team is trying to find out stats about edit conflicts. It looks like there was a patch merged back in January to collect stats on this (https://gerrit.wikimedia.org/r/#/c/266760/2/includes/EditPage.php) but I can't figure out where this is actually collecting the stats at. It looks like it's using the BufferingStatsdDataFactory class, but I couldn't find any documentation on what this is or where it actually collects the stats and I don't have time at the moment to investigate more deeply.
Question: Does anyone know where those stats are actually collected?
Request: Could someone create some documentation on MediaWiki.org for stats collection and retrieval via BufferingStatsdDataFactory?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Wed, Feb 10, 2016 at 12:08 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
What you're looking at now is the percent of edits that ended in an edit conflict since last April.
So when it says the average edit conflict rate for VisualEditor on 2015-10-07 was "0.01", does that mean 1% or 0.01%? I'm guessing 1%, but just want to clarify since the labels are ambiguous.
On Wed, Feb 10, 2016 at 12:08 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
What you're looking at now is the percent of edits that ended in an edit conflict since last April.
So when it says the average edit conflict rate for VisualEditor on 2015-10-07 was "0.01", does that mean 1% or 0.01%? I'm guessing 1%, but just want to clarify since the labels are ambiguous.
The rate is the number of times there was an edit conflict divided by the number of times a save was attempted. So yes, 0.0104 times out of 1, there was an edit conflict, which means 1.04% of the time.
The query for this, and the line you're interested in is here:
https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/failu...
Also, the raw data behind that specific graph is here:
http://datasets.wikimedia.org/limn-public-data/metrics/failure_rates_by_type...
http://datasets.wikimedia.org/limn-public-data/metrics/failure_rates_by_type...
Dan,
When you say the huge amount of data is slowing the load, are you referring to the size of the Edit event logs? That wouldn't make sense since I thought the data is all pre-computed and loaded from the TSVs every time you load the page.
Anyway, as a heads-up to everyone, new data generation for that dashboard has been disabled since those logs have gotten too big. We've already reduced the volume of new Edit events to 20% of previous by increasing the sampling, and the DBAs are going to randomly purge existing events ( https://phabricator.wikimedia.org/T124676). Once that's done, we can turn the computations back on (https://phabricator.wikimedia.org/T126058).
On Wed, Feb 10, 2016 at 12:06 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
On Wed, Feb 10, 2016 at 12:08 PM, Dan Andreescu dandreescu@wikimedia.org
wrote:
What you're looking at now is the percent of edits that ended in an edit conflict since last April.
So when it says the average edit conflict rate for VisualEditor on 2015-10-07 was "0.01", does that mean 1% or 0.01%? I'm guessing 1%, but just want to clarify since the labels are ambiguous.
The rate is the number of times there was an edit conflict divided by the number of times a save was attempted. So yes, 0.0104 times out of 1, there was an edit conflict, which means 1.04% of the time.
The query for this, and the line you're interested in is here:
https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/failu...
Also, the raw data behind that specific graph is here:
http://datasets.wikimedia.org/limn-public-data/metrics/failure_rates_by_type...
http://datasets.wikimedia.org/limn-public-data/metrics/failure_rates_by_type...
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Neil, yeah, it is pre-computed, but for the sessions sunburst it's still a HUGE amount of data. Take a look at the TSV. It could be transformed into a much easier to read thing on the server. But I remember you saying nobody really uses that graph anyway, so we could also just remove it. Let's chat offline.
On Thu, Feb 11, 2016 at 2:17 PM, Neil P. Quinn nquinn@wikimedia.org wrote:
Dan,
When you say the huge amount of data is slowing the load, are you referring to the size of the Edit event logs? That wouldn't make sense since I thought the data is all pre-computed and loaded from the TSVs every time you load the page.
Anyway, as a heads-up to everyone, new data generation for that dashboard has been disabled since those logs have gotten too big. We've already reduced the volume of new Edit events to 20% of previous by increasing the sampling, and the DBAs are going to randomly purge existing events ( https://phabricator.wikimedia.org/T124676). Once that's done, we can turn the computations back on (https://phabricator.wikimedia.org/T126058).
On Wed, Feb 10, 2016 at 12:06 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
On Wed, Feb 10, 2016 at 12:08 PM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
What you're looking at now is the percent of edits that ended in an edit conflict since last April.
So when it says the average edit conflict rate for VisualEditor on 2015-10-07 was "0.01", does that mean 1% or 0.01%? I'm guessing 1%, but just want to clarify since the labels are ambiguous.
The rate is the number of times there was an edit conflict divided by the number of times a save was attempted. So yes, 0.0104 times out of 1, there was an edit conflict, which means 1.04% of the time.
The query for this, and the line you're interested in is here:
https://github.com/wikimedia/analytics-limn-edit-data/blob/master/edit/failu...
Also, the raw data behind that specific graph is here:
http://datasets.wikimedia.org/limn-public-data/metrics/failure_rates_by_type...
http://datasets.wikimedia.org/limn-public-data/metrics/failure_rates_by_type...
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Neil P. Quinn https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF, product analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics