Hi all,
tl;dr: we'd like to remove the rev_is_revert field from the mediawiki.revision-create stream to solve a missing event problem.
For years now, we've known that the mediawiki.revision-create stream https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_revision_create has been missing many real revision create events https://phabricator.wikimedia.org/T215001 when compared with MediaWiki's MySQL databases. This makes the stream almost useless for those who want to use it as a notification mechanism about all MediaWiki page changes.
The reason for the large number of missing events is because the code that emits the event is subscribing to the wrong MediaWiki hook. This patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/679353/ will fix this, however the correct hook does not give us the information we need to set the rev_is_revert and rev_revert_details fields. This field is relatively new (only added last August 2020 https://github.com/wikimedia/schemas-event-primary/commit/53b6480cb1045316ce7bf16987e6169fa386450f#diff-70a054c62940bbabcef7a38e58eb4bf4d9001ed46dd6277473509e5775ec5d34R53-R94). We think that including the missing revisions is more important than capturing the revert information, which really only captures whether or not a user used the MediaWiki UI to issue a revert.
We plan on moving forward with this, but would like feedback before we do. If you have objections, or other ideas on how we can provide this data (like maybe including it in mediawiki/revision-tags-change https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/revision/tags-change/current.yaml and making that public?), let us know by replying to this email or in this ticket: https://phabricator.wikimedia.org/T215001
Thanks! -Andrew Otto SRE, Data Engineering, WMF
Hello again,
Due to lack of of feedback so far, we are going to assume that revert information in the mediawiki.revision-create stream is not widely used. We will move forward with removing it without blocking on first exposing the same information elsewhere.
Do let us know if there are objections.
Thank you! -Andrew Otto SRE, Data Engineering, WMF
On Mon, Apr 19, 2021 at 9:37 AM Andrew Otto otto@wikimedia.org wrote:
Hi all,
tl;dr: we'd like to remove the rev_is_revert field from the mediawiki.revision-create stream to solve a missing event problem.
For years now, we've known that the mediawiki.revision-create stream https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_revision_create has been missing many real revision create events https://phabricator.wikimedia.org/T215001 when compared with MediaWiki's MySQL databases. This makes the stream almost useless for those who want to use it as a notification mechanism about all MediaWiki page changes.
The reason for the large number of missing events is because the code that emits the event is subscribing to the wrong MediaWiki hook. This patch https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/679353/ will fix this, however the correct hook does not give us the information we need to set the rev_is_revert and rev_revert_details fields. This field is relatively new (only added last August 2020 https://github.com/wikimedia/schemas-event-primary/commit/53b6480cb1045316ce7bf16987e6169fa386450f#diff-70a054c62940bbabcef7a38e58eb4bf4d9001ed46dd6277473509e5775ec5d34R53-R94). We think that including the missing revisions is more important than capturing the revert information, which really only captures whether or not a user used the MediaWiki UI to issue a revert.
We plan on moving forward with this, but would like feedback before we do. If you have objections, or other ideas on how we can provide this data (like maybe including it in mediawiki/revision-tags-change https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/revision/tags-change/current.yaml and making that public?), let us know by replying to this email or in this ticket: https://phabricator.wikimedia.org/T215001
Thanks! -Andrew Otto SRE, Data Engineering, WMF