Hi MA,
You could still accomplish the local environment you are describing by using 8 ssh
tunnels. All the database name DNS aliases go reference the section names, eventually (s1,
s2, s3, s4 in the form of s1.analytics.db.svc.eqiad.wmflabs, etc.). An app could be
written to connect to the correct section instead of the database if you are doing that
kind of thing, but you’ll either need to make requests to
https://noc.wikimedia.org/conf/dblists/s<correct-number>.dblist
<https://noc.wikimedia.org/conf/dblists/s%3Ccorrect-number%3E.dblist> like
https://noc.wikimedia.org/conf/dblists/s4.dblist
<https://noc.wikimedia.org/conf/dblists/s4.dblist> and map things out from there or
perhaps check DNS for the database name and look up the “s#” record from there (which is
currently possible in Lua, and I can provide an example of how I did it in that
language).
A mediawiki config checkout would also work besides what can be gleaned from
noc.wikimedia.org <http://noc.wikimedia.org/>.
We can try to document some examples of how you might do it either way. I’m sure it is
non-trivial, but 8 tunnels is more workable than 900, at least.
Routing by reading the queries on the fly is quite tricky. The closest I’ve seen
ready-made tools come to that is ProxySQL, and that focuses on sharding, which is not
exactly the same thing.
Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm
On Nov 10, 2020, at 2:13 PM, MusikAnimal
<musikanimal(a)gmail.com> wrote:
Hi! Most tools query just a single db at a time, so I don't think this will be a
massive problem. However some such as Global Contribs[0] and GUC[1] can theoretically
query all of them from a single request. Creating new connections on-the-fly seems doable
in production, the issue is how to work on these tools in a local environment. Currently
the recommendation is to use a SSH tunnel to the desired host,[2] such as
enwiki.analytics.db.svc.eqiad.wmflabs. Surely we can't do this same port forwarding
for 900+ connections.
Any ideas? Perhaps there's some way to make a host that automatically forwards to the
correct one, solely for developer use? Or will development of such global tools need to
happen in the Cloud Services environment?
~ MA
[0]
https://xtools.wmflabs.org/globalcontribs
<https://xtools.wmflabs.org/globalcontribs>
[1]
https://guc.toolforge.org/ <https://guc.toolforge.org/>
[2]
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#SSH_tunneling_f…
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#SSH_tunneling_for_local_testing_which_makes_use_of_Wiki_Replica_databases>
On Tue, Nov 10, 2020 at 3:26 PM Joaquin Oltra Hernandez <jhernandez(a)wikimedia.org
<mailto:jhernandez@wikimedia.org>> wrote:
TLDR: Wiki Replicas' architecture is being redesigned for stability and performance.
Cross database JOINs will not be available and a host connection will only allow querying
its associated DB. See [1]
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign> for more
details.
Hi!
In the interest of making and keeping Wiki Replicas a stable and performant service, a
new backend architecture is needed. There is some impact in the features and usage
patterns.
What should I do? To avoid breaking changes, you can start making the following changes
*now*:
- Update existing tools to ensure queries are executed against the proper database
connection
- Eg: If you want to query the `eswiki_p` DB, you must connect to the
`eswiki.analytics.db.svc.eqiad.wmflabs` host and `eswiki_p` DB, and not to enwiki or other
hosts
- Check your existing tools and services queries for cross database JOINs, rewrite the
joins in application code
- Eg: If you are doing a join across databases, for example joining `enwiki_p` and
`eswiki_p`, you will need to query them separately, and filter the results of the separate
queries in the code
Timeline:
- November - December: Early adopter testing
- January 2021: Existing and new systems online, transition period starts
- February 2021: Old hardware is decommissioned
We need your help
- If you would like to beta test the new architecture, please let us know and we will
reach out to you soon
- Sharing examples / descriptions of how a tool or service was updated, writing a common
solution or some example code others can utilize and reference, helping others on IRC and
the mailing lists
If you have questions or need help adapting your code or queries, please contact us [2]
<https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication>, or write on
the talk page [3]
<https://wikitech.wikimedia.org/wiki/Talk:News/Wiki_Replicas_2020_Redesign>.
We will be sending reminders, and more specific examples of the changes via email and on
the wiki page. For more information see [1]
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>.
[1]:
https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>
[2]:
https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication
<https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication>
[3]:
https://wikitech.wikimedia.org/wiki/Talk:News/Wiki_Replicas_2020_Redesign
<https://wikitech.wikimedia.org/wiki/Talk:News/Wiki_Replicas_2020_Redesign>
--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org <mailto:Cloud-announce@lists.wikimedia.org>
(formerly labs-announce(a)lists.wikimedia.org
<mailto:labs-announce@lists.wikimedia.org>)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
<https://lists.wikimedia.org/mailman/listinfo/cloud-announce>
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud