While I understand the motivation for the change, I think in this case the label reversing used is more complex than needed. It would have been simpler (and thus likely more efficient / less bug-prone) to reverse the whole hostnames, storing moc.elpmaxe.www instad of com.example.www

On Mon, 11 Dec 2023 at 15:34, Amir Sarabadani <asarabadani@wikimedia.org> wrote:
Hi Tilman,
Sorry for the late reply.

Regarding finding the actual link from the row. The recommended way is to do processing in code afterwards. That's what MediaWiki does (in https://gerrit.wikimedia.org/g/mediawiki/core/+/80790ffc21a49fbe7709eaf5ce634b645798cf47/includes/ExternalLinks/LinkFilter.php#264) and you can easily replicate the logic of LinkFilter::reverseIndexes() in your programming language of choice. Doing all of data processing in SQL is not recommended.

Indeed, we these changes are really necessary. For example with the current growth of Wikimedia Commons we will have to resort to more drastic actions if its database growth doesn't slow down (See https://phabricator.wikimedia.org/T343131 and https://phabricator.wikimedia.org/F37157040). Noting that database growth is not always about the wiki's growth, lots of times it's just high use of some features of mediawiki (in here, templates and external links).

We will keep in mind to update documentation for further work (and thank you for the feedback!). The next will be pagelinks.

Best
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/