Hey, As a pet project of mine, I have been trying to find and fix discrepancies between the database schema in the codebase (like tables.sql in mediawiki core) and production. I wrote a script to do a basic analysis and got the results but it was so big that I needed to cluster them so now we have this reports: https://phabricator.wikimedia.org/T104459#4314828
Some are hot fixes (like tmp_2 indexes) that probably need to be added to the codebase (with proper name) or dropped from production which needs investigation.
Some are unimplemented schema changes, like user_options column in user table that was dropped in the code nine years ago but fell into the cracks and never got fixed in production.
Some are changes that partially implemented like page_no_title_convert field on page table that only appear in 32 hosts (around two third of the hosts) and there is mention of adding it in HISTORY but I can't find any mention of removing it neither its existence so I have no idea how to proceed here.
The problem is that these cases need investigation before we can hand it off to our DBAs. For example, rc_cur_time field in recentchanges table has been dropped from the mediawiki core in 2014 but it exists in production but it also still exists in oracle, potgres and mssql schemas but not in mssql, sqlite or mysql schemas. In this case, we need to fix both production and several schemas in mediawiki core.
Or text table on 21 hosts (around half of them) has an extra field called inverse_timestamp which I can't find any mention of it in the code but there used to be a field with this name in revision table that got dropped in 2005 and I have no idea how to proceed here.
It would be great if you take a look the clustered reports [1] and make a phabricator ticket for them and investigate how to proceed here. Thank you
[1]: https://phabricator.wikimedia.org/T104459#4310482
Best
On Mon, Jul 9, 2018 at 1:10 PM, Amir Ladsgroup ladsgroup@gmail.com wrote:
Some are changes that partially implemented like page_no_title_convert field on page table that only appear in 32 hosts (around two third of the hosts) and there is mention of adding it in HISTORY but I can't find any mention of removing it neither its existence so I have no idea how to proceed here.
Looks like it was added in r16524 http://mediawiki.org/wiki/Special:Code/MediaWiki/16524 and reverted (except for the release note entry) in r16526 http://mediawiki.org/wiki/Special:Code/MediaWiki/16526.
Or text table on 21 hosts (around half of them) has an extra field called inverse_timestamp which I can't find any mention of it in the code but there used to be a field with this name in revision table that got dropped in 2005 and I have no idea how to proceed here.
Revisions used to be stored much like how images still are: there was the "cur" table that had data about the current revision, including its actual content, and the "old" table for previous revisions.
When all this was redone to have "page" and "revision" and "text" (r6710 http://mediawiki.org/wiki/Special:Code/MediaWiki/6710), the "old" table was just renamed to "text" since most of the text was already in that table. Which is why all the fields in text use "old_" as a prefix. It seems that the extraneous columns weren't dropped on all wikis.
wikitech-l@lists.wikimedia.org