On Sun, Apr 12, 2020 at 7:48 AM Huji Lee <huji.huji@gmail.com> wrote:
>
> One possible solution is to create a script which is scheduled to run once a month; the script would download the latest dump of the wiki database,[3] load it into MySQL/MariaDB, create some additional indexes that would make our desired queries run faster, and generate the reports using this database. A separate script can then purge the data a few days later.
If I am understanding your proposal here, I think the main difference
from the current Wiki Replicas would be "create some additional
indexes that would make our desired queries run faster". We do have
some indexes and views in the Wiki Replicas which are specifically
designed to make common things faster today. If possible, adding to
these rather than building a one-off process of moving lots of data
round for your tool would be nice.
I say this not because what you are proposing is a ridiculous
solution, but because it is a unique solution for your current problem
that will not help others who are having similar problems. Having 1
tool use ToolsDB or a custom Cloud VPS project like this is possible,
but having 100 tools try to follow that pattern themselves is not.
> Out of abundance of caution, I thought I should ask for permission now, rather than forgiveness later. Do we have a process for getting approval for projects that require gigabytes of storage and hours of computation, or is what I proposed not even remotely considered a "large" project, meaning I am being overly cautious?
<https://phabricator.wikimedia.org/project/view/2875/>
Bryan
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud