Thank you so much for this work. There are additional benefits as we all have a clearer understanding of how data moves around our various systems. My team (Data Engineering) is standing up a data catalog and we'll see about ingesting these schemas directly to centralize the information.
On Thu, Jul 14, 2022 at 17:00 Der Umherirrende umherirrender_de.wp@web.de wrote:
Hello,
read previous update news about the abstract schema on the mediawiki-l list under < https://lists.wikimedia.org/hyperkitty/list/mediawiki-l@lists.wikimedia.org/...
The migration of mediawiki-core from sql schema file to the abstract schema finished in May 2021, while the migration of WMF Deployed Extensions is still ongoing on the goal task T261912[1] where 38 extensions with own database tables are listed.
But I am happy to write: The migration of all WMF Deployed Extensions is now finished!
At the end there are 35 extensions affected and all now shipes a json file with the schema defintion and the sql file is generated from the json file. The most patch sets were merged this week and will be released with REL1_39.
To be able to provide the abstract schema file some extensions needs updates beforehand, like standardise of timestamp columns or remove of foreign keys. The new json files makes it easier to analyse the schemas now and find more optimisation with automation, like duplicate indexes or missing primary keys.
A benefit of the abstract schema is also to provide a postgres and sqlite schema for free for all these extensions. It also easier for schema changes in the future to keep the schema file in sync and hopefully gives a better support for third party users with postgres or sqlite.
[Disclaimer: Do not expect that the extension will work directly with postgres or sqlite. There are some RDBMS differences needs to handle in the code, like different timestamp formats on postgres, but feel free to patch the extension or fill bugs if you find issues, where the differences are not addressed in the code.]
There are still tasks to fix for better integration of the abstract schema, like testing in CI if the schema file and the generated sql files are in sync to avoid issues on partial updates[2]. This is only done for core at the moment, please take a deeper look, when reviewing such patches on extensions, thanks. Knowing that all extensions have one of the three supported rdbms schema makes it easier to enable voting CI jobs on merge for postgres and sqlite in the future, but that needs fixing of tests first. Help is welcome here.
Looking ahead: In the future the abstract schema of the extension could also be used to detect schema drifts on extension tables on wmf wikis, this happens at the moment only for core[3].
Hopefully some of the long standing requests to bundle popular extensions with the tarball can be addressed[4], because the extensions are now fulfill the requirement to support all rdbms. Testing is welcome for this as well!
It is possible that there are some areas needs updates, like documentation or guidelines, feel free to improve if you find something outdated.
I would like to thanks Sam Reed to start with the convert of extensions, as well as various other developer converting extensions in the last years/months! Great work I could benefit from while creating my own patch sets. Also thanks to James Forrester for the review of many of my patch sets.
Greetings, Der Umherirrende
[1] https://phabricator.wikimedia.org/T261912 [2] https://phabricator.wikimedia.org/T261057 [3] https://drift-tracker.toolforge.org/ [4] https://phabricator.wikimedia.org/T305072 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/