(Adding engineering-all@ for Wikimedia engineers' awareness.)

Good work on this Amir, James, Gergo, Reedy, DBAs, and all the others not mentioned.

Cleaning up tech debt to provide a stable base for future code hygiene is important, hard, and sometimes thankless work.


On Thu, May 27, 2021 at 4:33 PM Amir Sarabadani <ladsgroup@gmail.com> wrote:
If you missed the previous updates, there's the first and the second one. This is the last one I'm sending but in a good way.

We now finished migrating all 57 core tables to abstract schema. Now tables.sql is empty for MySQL. We will soon remove the tables.sql files and links to them.

We have also cleaned more than hundreds of old schema change files. That enabled us to actually look for unused sql files and drop tens of unused ones that have not been used since 2002, 2004, or 2005 (and much more) and they got lost in the sheer number of our sql patch files.

The next update is exciting for me. With abstraction in place, we can now have a proper tracking of drifts between schema in paper and our production (This is a follow up from a major incident in 2018). Now we have https://drift-tracker.toolforge.org/ that keeps track of these drifts. Our schema has been around for more than twenty years and we have hundreds of database hosts, making sure everything is using the right database schema (and stays correct) is impossible manually and we have been finding and fixing these drifts since 2018, see this comment onwards.). Of course more work in improving the tracker is welcome (here's the list)

If you want to enjoy the benefits of abstract schema [1] in extension(s) you or your team maintains, Please abstract the schema of your extension. There's a long list of WMF deployed extensions that are not using abstract schema and some even already have a patch that only needs reviewing. Once that's done, we can add that to drift tracking and have a more comprehensive list of potential issues. If you need help with the abstraction work, just ping me.

This also helped us resolve several long-standing tickets like T104459 (5.5 years old), T62962 (7 years old), and T42626 (soon reaching its ninth birthday) and will help us to address even more tech debt in future.

There is more to be done, improving the abstract schema *change* system, finding a home of schema documentation, improving the drift tracker and making it more automated, so much more. But the biggest chunk of work is now finally done.

I really would like to thank Ammarpad for great work on abstracting the tables and handling all sorts of edge cases, James Forrester and Tgr for their reviews which without them this wouldn't be possible and Sam Reed who wrote a script to speed up migration. This was a team work to its core.

[1] In more details, by abstracting you will have automated checks for dirfits of the schema of extension(s) you maintain and production. You will have Postgres support for free. Also, you can have automated documentation generation, ability to test the schema itself, and have better consistency of your data types (like one datatype for timestamps).

Until the next adventure.
Amir (he/him)

Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org

| Greg Grossmeier              GPG: B2FA 27B1 F7EB D327 6B8E |
| Dir. Engineering Productivity     A18D 1138 8E47 FAC8 1C7D |