With several weeks of delay, I finally had a chance to submit three Phab tasks around this topic. I have at least 10 more queries that I could add to the list, but I don't want to overwhelm people, so I will wait for those three to be resolved first.
Thanks!
Huji

On Tue, Apr 14, 2020 at 8:53 AM Huji Lee <huji.huji@gmail.com> wrote:
I completely appreciate the points you are making, Bryan and Jaime. And I would very much enjoy "dealing with you" if we end up going to "Cloud VPS project" route! If anything, I keep learning new things from you all.

Let's start where you suggested. I will create Phab tickets on which I will seek advice about how to optimize those super-slow queries (if at all possible).

Thank you for your attention!

On Tue, Apr 14, 2020 at 8:49 AM Huji Lee <huji.huji@gmail.com> wrote:
Yes. If you go to the source of all those pages, there is a hidden HTML element (<!-- --> kind) that has the SQL code for that report.

Here is one example: [1]


On Mon, Apr 13, 2020 at 7:19 PM MusikAnimal <musikanimal@gmail.com> wrote:
Is the source code public? Maybe the queries could be improved. I ran into many such issues too after the actor migration, but after taking advantage of specialized views[0] and join decomposition (get just the actor IDs, i.e. rev_actor, then the actor_names in a separate query), my tools are seemingly as fast as they were before.

~ MA


On Mon, Apr 13, 2020 at 5:03 PM Huji Lee <huji.huji@gmail.com> wrote:
I understand. However, I think that the use case we are looking at is relatively unique. I also think that indexes we need may not be desirable for all the Wiki Replicas (they would often be multi-column indexes geared towards a specific set of queries) and I honestly don't want to go through the several weeks (months?) of discussion to justify them.

Note that if we open the can of "more indexes on Wiki Replicas" worms, this would all of a sudden become an all-wiki discussion. I'm not sure if there are more than a handful wikis that do this level of page-level and user-level analytics as fawiki does, which means for most wikis (and for most Wiki Replica databases) those additional indexes may not even be justified.

Even if we were to generalize parts of this approach and bring it to Wiki Replicas, I would still argue that doing it at a smaller extent (one wiki DB for now) would be a reasonable starting point, no?

On Mon, Apr 13, 2020 at 4:42 PM Bryan Davis <bd808@wikimedia.org> wrote:
On Sun, Apr 12, 2020 at 7:48 AM Huji Lee <huji.huji@gmail.com> wrote:
>
> One possible solution is to create a script which is scheduled to run once a month; the script would download the latest dump of the wiki database,[3] load it into MySQL/MariaDB, create some additional indexes that would make our desired queries run faster, and generate the reports using this database. A separate script can then purge the data a few days later.

If I am understanding your proposal here, I think the main difference
from the current Wiki Replicas would be "create some additional
indexes that would make our desired queries run faster". We do have
some indexes and views in the Wiki Replicas which are specifically
designed to make common things faster today. If possible, adding to
these rather than building a one-off process of moving lots of data
round for your tool would be nice.

I say this not because what you are proposing is a ridiculous
solution, but because it is a unique solution for your current problem
that will not help others who are having similar problems. Having 1
tool use ToolsDB or a custom Cloud VPS project like this is possible,
but having 100 tools try to follow that pattern themselves is not.

> Out of abundance of caution, I thought I should ask for permission now, rather than forgiveness later. Do we have a process for getting approval for projects that require gigabytes of storage and hours of computation, or is what I proposed not even remotely considered a "large" project, meaning I am being overly cautious?

<https://phabricator.wikimedia.org/project/view/2875/>

Bryan
--
Bryan Davis              Technical Engagement      Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud