Hey,
As a pet project of mine, I have been trying to find and fix discrepancies
between the database schema in the codebase (like tables.sql in mediawiki
core) and production. I wrote a script to do a basic analysis and got the
results but it was so big that I needed to cluster them so now we have this
reports:
https://phabricator.wikimedia.org/T104459#4314828
Some are hot fixes (like tmp_2 indexes) that probably need to be added to
the codebase (with proper name) or dropped from production which needs
investigation.
Some are unimplemented schema changes, like user_options column in user
table that was dropped in the code nine years ago but fell into the cracks
and never got fixed in production.
Some are changes that partially implemented like page_no_title_convert
field on page table that only appear in 32 hosts (around two third of the
hosts) and there is mention of adding it in HISTORY but I can't find any
mention of removing it neither its existence so I have no idea how to
proceed here.
The problem is that these cases need investigation before we can hand it
off to our DBAs. For example, rc_cur_time field in recentchanges table has
been dropped from the mediawiki core in 2014 but it exists in production
but it also still exists in oracle, potgres and mssql schemas but not in
mssql, sqlite or mysql schemas. In this case, we need to fix both
production and several schemas in mediawiki core.
Or text table on 21 hosts (around half of them) has an extra field called
inverse_timestamp which I can't find any mention of it in the code but
there used to be a field with this name in revision table that got dropped
in 2005 and I have no idea how to proceed here.
It would be great if you take a look the clustered reports [1] and make a
phabricator ticket for them and investigate how to proceed here. Thank you
[1]:
https://phabricator.wikimedia.org/T104459#4310482
Best