The Wikimedia Data Engineering team is pleased to announce that a new event stream, mediawiki.page_change.v1, is now publicly available at stream.wikimedia.org (here https://stream.wikimedia.org/v2/ui/#/?streams=mediawiki.page_change.v1).
The new event stream models page changes using a consolidated changelog data model, whereas existing streams, like page-create and revision-create, model each type of page change as a separate stream. With the current model, you would have to consume multiple streams to understand how a MediaWiki page changed. With the new stream, when a page is created, edited, or deleted, an event captures the new state, as well as the state prior to the change.
For more information on what fields you can expect to see in the events, see the schema definition here https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/page/change/latest.yaml .
Starting now, new streams will be suffixed with a major version and will not use hyphens in stream names. For more information, see here https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning .
Benefits:
-
Only one stream to consume (instead of having to consume page-create, page-delete, and revision-create streams to have a full picture of page changes) -
Events are ordered for a given page_id (delete will not come before create) -
Latest event has current and prior state
The existing event streams, such as page-create, revision-create, page-delete, etc will continue to remain available, for now. We encourage people to migrate to the new consolidated stream when they can as we plan to mark the existing streams as deprecated within the next year. We will send another communication about this when the plan is decided.
If you have any questions or issues please drop a Phabricator ticket here https://phabricator.wikimedia.org/project/board/6628/
--
Luke Bowmaker
Data Engineering - Product Manager