Thank you so much for this work. There are additional
benefits as we all
have a clearer understanding of how data moves around our various systems.
My team (Data Engineering) is standing up a data catalog and we'll see
about ingesting these schemas directly to centralize the information.
On Thu, Jul 14, 2022 at 17:00 Der Umherirrende <umherirrender_de.wp(a)web.de>
wrote:
Hello,
read previous update news about the abstract schema on the mediawiki-l
list under <
https://lists.wikimedia.org/hyperkitty/list/mediawiki-l@lists.wikimedia.org…
The migration of mediawiki-core from sql schema file to the abstract
schema finished in May 2021, while the migration of WMF Deployed Extensions
is still ongoing on the goal task T261912[1] where 38 extensions with own
database tables are listed.
But I am happy to write:
The migration of all WMF Deployed Extensions is now finished!
At the end there are 35 extensions affected and all now shipes a json
file with the schema defintion and the sql file is generated from the json
file.
The most patch sets were merged this week and will be released with
REL1_39.
To be able to provide the abstract schema file some extensions needs
updates beforehand, like standardise of timestamp columns or remove of
foreign keys.
The new json files makes it easier to analyse the schemas now and find
more optimisation with automation, like duplicate indexes or missing
primary keys.
A benefit of the abstract schema is also to provide a postgres and sqlite
schema for free for all these extensions.
It also easier for schema changes in the future to keep the schema file
in sync and hopefully gives a better support for third party users with
postgres or sqlite.
[Disclaimer: Do not expect that the extension will work directly with
postgres or sqlite. There are some RDBMS differences needs to handle in the
code, like different timestamp formats on postgres, but feel free to patch
the extension or fill bugs if you find issues, where the differences are
not addressed in the code.]
There are still tasks to fix for better integration of the abstract
schema, like testing in CI if the schema file and the generated sql files
are in sync to avoid issues on partial updates[2]. This is only done for
core at the moment, please take a deeper look, when reviewing such patches
on extensions, thanks.
Knowing that all extensions have one of the three supported rdbms schema
makes it easier to enable voting CI jobs on merge for postgres and sqlite
in the future, but that needs fixing of tests first. Help is welcome here.
Looking ahead:
In the future the abstract schema of the extension could also be used to
detect schema drifts on extension tables on wmf wikis, this happens at the
moment only for core[3].
Hopefully some of the long standing requests to bundle popular extensions
with the tarball can be addressed[4], because the extensions are now
fulfill the requirement to support all rdbms. Testing is welcome for this
as well!
It is possible that there are some areas needs updates, like
documentation or guidelines, feel free to improve if you find something
outdated.
I would like to thanks Sam Reed to start with the convert of extensions,
as well as various other developer converting extensions in the last
years/months! Great work I could benefit from while creating my own patch
sets.
Also thanks to James Forrester for the review of many of my patch sets.
Greetings,
Der Umherirrende
[1]
https://phabricator.wikimedia.org/T261912
[2]
https://phabricator.wikimedia.org/T261057
[3]
https://drift-tracker.toolforge.org/
[4]
https://phabricator.wikimedia.org/T305072
_______________________________________________
Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
_______________________________________________
Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org