On Sun, Apr 12, 2020 at 7:48 AM Huji Lee huji.huji@gmail.com wrote:
One possible solution is to create a script which is scheduled to run once a month; the script would download the latest dump of the wiki database,[3] load it into MySQL/MariaDB, create some additional indexes that would make our desired queries run faster, and generate the reports using this database. A separate script can then purge the data a few days later.
If I am understanding your proposal here, I think the main difference from the current Wiki Replicas would be "create some additional indexes that would make our desired queries run faster". We do have some indexes and views in the Wiki Replicas which are specifically designed to make common things faster today. If possible, adding to these rather than building a one-off process of moving lots of data round for your tool would be nice.
I say this not because what you are proposing is a ridiculous solution, but because it is a unique solution for your current problem that will not help others who are having similar problems. Having 1 tool use ToolsDB or a custom Cloud VPS project like this is possible, but having 100 tools try to follow that pattern themselves is not.
Out of abundance of caution, I thought I should ask for permission now, rather than forgiveness later. Do we have a process for getting approval for projects that require gigabytes of storage and hours of computation, or is what I proposed not even remotely considered a "large" project, meaning I am being overly cautious?
https://phabricator.wikimedia.org/project/view/2875/
Bryan