On a bit of a side note, for forwarding many ips/ports through ssh might be interesting a tool like sshuttle [1], I'v used it in the past with success.
It's a bit complex, it uses an ssh tunnel + iptables/pf/... rules to move the traffic through that tunnel, and a process running on the other side copied and started through that same ssh tunnel, so if you have a complex network setup it might break some things (ex. it does not work yet with systemd-resolved, default on fedora 33).
[1] https://github.com/sshuttle/sshuttle
Cheers,
On 11/10 17:44, MusikAnimal wrote:
Got it. The https://noc.wikimedia.org/conf/dblists/ lists are plenty fast and easy enough to parse. I'll just cache that. It would be neat if we could rely on the slice specified in meta_p in the future, as in my case we have to query meta_p.wiki regardless, but not a big deal :)
Thank you! I think I have enough information to move forward.
~ MA
On Tue, Nov 10, 2020 at 5:13 PM Brooke Storm bstorm@wikimedia.org wrote:
Yes, you might be able to use meta_p.wiki table. However, when wikis are moved between sections, nothing updates the meta_p.wiki table at this time. Requests to noc.wikimedia.org are accurate and up to date, as far as I know. We only update meta_p when we add the wiki (at least that’s how it is now). Also, the DNS gets synced and updated every time we run the script, so it is usually up-to-date. You could try meta_p.wiki and fall back to DNS or noc.wikimedia.org if that fails, perhaps? Meta_p is expected to be on s7 in the new design.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm
On Nov 10, 2020, at 3:01 PM, MusikAnimal musikanimal@gmail.com wrote:
Ah yes, 8 tunnels is more than manageable. The `slice` column in the meta_p.wiki table is the one we need to connect to for said wiki, right? So in theory, I always have SSH tunnels open for every slice, and the first thing I do is check meta_p.wiki for the given wiki, then I know which of those s1-s8 connections to use? So I really only need 8 connections (even in production). Maybe not what you would recommend for every tool, rather just the "global" ones facing this specific issue.
Can't you just tunnel to the login server and connect by hostname from
there?
Hmm I'm not sure I follow. Right now I SHH to login.toolforge.org, but with "-L 4711:enwiki.analytics.db.svc.eqiad.wmflabs:3306" for port forwarding from my local mysql to the remote. It sounds like instead I need to tunnel to s1-s8, and use the correct one based on the desired database.
~ MA
On Tue, Nov 10, 2020 at 4:32 PM Brooke Storm bstorm@wikimedia.org wrote:
Hi MA, You could still accomplish the local environment you are describing by using 8 ssh tunnels. All the database name DNS aliases go reference the section names, eventually (s1, s2, s3, s4 in the form of s1.analytics.db.svc.eqiad.wmflabs, etc.). An app could be written to connect to the correct section instead of the database if you are doing that kind of thing, but you’ll either need to make requests to https://noc.wikimedia.org/conf/dblists/s<correct-number>.dblist like https://noc.wikimedia.org/conf/dblists/s4.dblist and map things out from there or perhaps check DNS for the database name and look up the “s#” record from there (which is currently possible in Lua, and I can provide an example of how I did it in that language).
A mediawiki config checkout would also work besides what can be gleaned from noc.wikimedia.org.
We can try to document some examples of how you might do it either way. I’m sure it is non-trivial, but 8 tunnels is more workable than 900, at least.
Routing by reading the queries on the fly is quite tricky. The closest I’ve seen ready-made tools come to that is ProxySQL, and that focuses on sharding, which is not exactly the same thing.
Brooke Storm Staff SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm
On Nov 10, 2020, at 2:13 PM, MusikAnimal musikanimal@gmail.com wrote:
Hi! Most tools query just a single db at a time, so I don't think this will be a massive problem. However some such as Global Contribs[0] and GUC[1] can theoretically query all of them from a single request. Creating new connections on-the-fly seems doable in production, the issue is how to work on these tools in a local environment. Currently the recommendation is to use a SSH tunnel to the desired host,[2] such as enwiki.analytics.db.svc.eqiad.wmflabs. Surely we can't do this same port forwarding for 900+ connections.
Any ideas? Perhaps there's some way to make a host that automatically forwards to the correct one, solely for developer use? Or will development of such global tools need to happen in the Cloud Services environment?
~ MA
[0] https://xtools.wmflabs.org/globalcontribs [1] https://guc.toolforge.org/ [2] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#SSH_tunneling_fo...
On Tue, Nov 10, 2020 at 3:26 PM Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:
TLDR: Wiki Replicas' architecture is being redesigned for stability and performance. Cross database JOINs will not be available and a host connection will only allow querying its associated DB. See [1] https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign for more details.
Hi!
In the interest of making and keeping Wiki Replicas a stable and performant service, a new backend architecture is needed. There is some impact in the features and usage patterns.
What should I do? To avoid breaking changes, you can start making the following changes *now*:
- Update existing tools to ensure queries are executed against the
proper database connection
- Eg: If you want to query the `eswiki_p` DB, you must connect to the
`eswiki.analytics.db.svc.eqiad.wmflabs` host and `eswiki_p` DB, and not to enwiki or other hosts
- Check your existing tools and services queries for cross database
JOINs, rewrite the joins in application code
- Eg: If you are doing a join across databases, for example joining
`enwiki_p` and `eswiki_p`, you will need to query them separately, and filter the results of the separate queries in the code
Timeline:
- November - December: Early adopter testing
- January 2021: Existing and new systems online, transition period starts
- February 2021: Old hardware is decommissioned
We need your help
- If you would like to beta test the new architecture, please let us
know and we will reach out to you soon
- Sharing examples / descriptions of how a tool or service was updated,
writing a common solution or some example code others can utilize and reference, helping others on IRC and the mailing lists
If you have questions or need help adapting your code or queries, please contact us [2] https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication, or write on the talk page [3] https://wikitech.wikimedia.org/wiki/Talk:News/Wiki_Replicas_2020_Redesign .
We will be sending reminders, and more specific examples of the changes via email and on the wiki page. For more information see [1] https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign.
-- Joaquin Oltra Hernandez Developer Advocate - Wikimedia Foundation _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud