Hello Analytics,
The Data Engineering team will start the deployment[1] of the changes that will support the Temp Accounts https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts initiative in the Data Lake https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake starting today Wednesday January 22nd 2025. These changes are not activating the Temp Accounts feature in any of the wikis, but rather enabling support for Temp Accounts in the Hadoop Data Lake. It is expected that some MediaWiki related Data Lake tables[2] might be temporarily unavailable during the following couple of days. By the end of this process MediaWikiHistory tables and other derivative tables will fully support Temp Accounts new semantics and data.
As part of the deployment process we plan to re-run the jobs for the 2024-12 snapshot. This means the data model for that snapshot will be updated. The changes are mostly backwards compatible, except for:
- The mediawiki_user_history table's `anonymous` field will be renamed to `is_anonymous`. - The geoeditors_edits_monthly table's `editors_are_anonymous` field will be renamed to `users_are_anonymous`. - The MediaWikiHistory dumps will have some new fields inserted, and the order of the existing fields will change.
We haven't found any existing code (within the WMF) that could break due to these non-backwards compatible changes, but if you find any, please let us know.
[1] Deployment plan https://docs.google.com/document/d/1-GhyLepEL7rqJlY1a2RKQ_1YI2QYgVpFSmzpq9nXIag/edit?tab=t.0
[2] List of affected tables
- wmf.mediawiki_history - wmf.mediawiki_user_history - wmf.mediawiki_page_history - wmf.mediawiki_history_reduced - wmf.edit_hourly - wmf.editors_daily - wmf.unique_editors_by_country - wmf.geoeditors_edits_monthly - wmf.geoeditors_monthly - wmf.geoeditors_public_monthly
The deployment is now finished!
If you observe any issues that you think might be related, please file a Phabricator ticket using this link https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projectPHIDs=data-engineering .
Thank you!
On Wed, Jan 22, 2025 at 7:15 PM Marcel Ruiz Forns mforns@wikimedia.org wrote:
Hello Analytics,
The Data Engineering team will start the deployment[1] of the changes that will support the Temp Accounts https://www.mediawiki.org/wiki/Trust_and_Safety_Product/Temporary_Accounts initiative in the Data Lake https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake starting today Wednesday January 22nd 2025. These changes are not activating the Temp Accounts feature in any of the wikis, but rather enabling support for Temp Accounts in the Hadoop Data Lake. It is expected that some MediaWiki related Data Lake tables[2] might be temporarily unavailable during the following couple of days. By the end of this process MediaWikiHistory tables and other derivative tables will fully support Temp Accounts new semantics and data.
As part of the deployment process we plan to re-run the jobs for the 2024-12 snapshot. This means the data model for that snapshot will be updated. The changes are mostly backwards compatible, except for:
- The mediawiki_user_history table's `anonymous` field will be renamed
to `is_anonymous`.
- The geoeditors_edits_monthly table's `editors_are_anonymous` field
will be renamed to `users_are_anonymous`.
- The MediaWikiHistory dumps will have some new fields inserted, and
the order of the existing fields will change.
We haven't found any existing code (within the WMF) that could break due to these non-backwards compatible changes, but if you find any, please let us know.
[1] Deployment plan https://docs.google.com/document/d/1-GhyLepEL7rqJlY1a2RKQ_1YI2QYgVpFSmzpq9nXIag/edit?tab=t.0
[2] List of affected tables
- wmf.mediawiki_history
- wmf.mediawiki_user_history
- wmf.mediawiki_page_history
- wmf.mediawiki_history_reduced
- wmf.edit_hourly
- wmf.editors_daily
- wmf.unique_editors_by_country
- wmf.geoeditors_edits_monthly
- wmf.geoeditors_monthly
- wmf.geoeditors_public_monthly
-- *Marcel Ruiz Forns** (he/him)* Senior Software Engineer