Here is my understanding of the connections situation:

Currently the limit is 10 concurrent connections available to an account on an instance. Since the move is to a multi-instance architecture, with the new architecture you will have 10 concurrent connections per account per instance (in this case, 1 instance = 1 section).

So that would effectively mean you have 10 on enwiki (s1), 10 on wikidata (s8), 10 on commons (& testcommonswiki in s4), 10 on smaller wikis (s3 or s5), etc. This likely means more parallelism for some activities and less connection issues for many applications if you happen to be querying DBs on separate instances.

In the case of GUC and XTools Global Contribs since they hit every DB (and hence instance), I think the effective limit will still be 10, so it will be something to keep in mind. In the future if you hit issues with the number of connections we can think of ways to avoid those.

On Thu, Nov 19, 2020 at 7:00 AM MusikAnimal <musikanimal@gmail.com> wrote:
Unrelated but important question (sorry to fragment this thread): What about the max number of connections imposed on a db user? That applies to each open connection, right? For example, we still occasionally hit it in XTools, meaning there are 30 open connections and no new connections can be opened. If this is true, I will surely need a major increase in the number of allowed connections for my db user. Let's assume I'm able to do with just 6 to access all dbs. That means only 5 people need to be running Global Contribs queries before the next user gets an error (give or take, depending on how fast the queries are and which connections are tied up, also taking into account use by other XTools features such as the Edit Counter). Surely you catch my drift. I suppose I'll find out when I get there; just sharing this thought ahead of time in case it hasn't been considered yet.

~ MA

On Thu, Nov 19, 2020 at 12:44 AM MusikAnimal <musikanimal@gmail.com> wrote:
Hello Joaquin!

Hey MA, I've checked, and while not explicitly disallowed, the fact that this could work is more of an implementation detail that shouldn't really be relied on.

The sections and where the instances are on them are organized to maintain the service, and are not supposed to be depended on since they could change.

Even if the mappings are public and fairly stable, there could be a point where a change in the implementation/organization is made -like with this new architecture- and those in-section cross-db joins would stop working.

I'm not saying I will blindly construct cross-wiki queries. Rather, I will only do it after fetching from the db-lists to confirm which ones can be queried cross-wiki. In the case of GUC and XTools Global Contribs, this could mean massive performance improvements. Allow me to paint a picture; we have an account attached to 100 wikis, I want a list of all global edits ordered chronologically. For day-to-day steward life, this is essential so I'd like to find the most efficient route possible, even if it's a little hacky :) So going off of what we're doing now, my high-level vision would be:

1) check db-lists (or from cached result)
2) Check CentralAuth to see which wikis the user has edits on. Here we find there are 100 wikis.
2) Cross-referencing the db-lists, I now know that 75 of the wikis I want to query are on s1, and 25 on s2.
3) For each wiki, I have a subquery to grab all edits by that user on that specific wiki within that slice (may along add WHERE clauses for rev_timestmap, etc.)
4) Take each of those subqueries and wrap it like: (SELECT * FROM ( [subquery1] ) UNION ( [subquery2] ) … ) a ORDER BY rev_timestamp DESC LIMIT 50
5) Do the same for each of the other slices
6) Combine the results from each slice and resequence the edits chronologically, stopping at 50 (the first page of edits to show to the user).

That sounds not like the most fun, but I think it would work. With the current 8 slices, it shouldn't slow it down too terribly (some slices will be faster than others).

Are you discouraging this approach? If I *have* to open and use a separate connection to each of those 100 databases, regardless of the slice, the processing may become much slower. Let's move on to IPs, where we have to check *every* wiki. 900+ separation connections. Again, I'm not sure how I'd get this even set up on my local, as presumably I'd need 900+ open SSH tunnels. Maybe a bash script?

I just want to make sure I've got this right before I start cording. In the end hopefully I'll have a working strategy that I can share with others.

Thanks,

~ MA
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud


--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation