Hi Tilman,
Sorry for the late reply.
Regarding finding the actual link from the row. The recommended way is to do processing in code afterwards. That's what MediaWiki does (in https://gerrit.wikimedia.org/g/mediawiki/core/+/80790ffc21a49fbe7709eaf5ce634b645798cf47/includes/ExternalLinks/LinkFilter.php#264) and you can easily replicate the logic of LinkFilter::reverseIndexes() in your programming language of choice. Doing all of data processing in SQL is not recommended.
Indeed, we these changes are really necessary. For example with the current growth of Wikimedia Commons we will have to resort to more drastic actions if its database growth doesn't slow down (See https://phabricator.wikimedia.org/T343131 and https://phabricator.wikimedia.org/F37157040). Noting that database growth is not always about the wiki's growth, lots of times it's just high use of some features of mediawiki (in here, templates and external links).
We will keep in mind to update documentation for further work (and thank you for the feedback!). The next will be pagelinks.
Best
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/