On a bit of a side note, for forwarding many ips/ports through ssh might be
interesting a tool like sshuttle [1], I'v used it in the past with success.
It's a bit complex, it uses an ssh tunnel + iptables/pf/... rules to move the
traffic through that tunnel, and a process running on the other side copied
and started through that same ssh tunnel, so if you have a complex network
setup it might break some things (ex. it does not work yet with
systemd-resolved, default on fedora 33).
[1]
Got it. The
https://noc.wikimedia.org/conf/dblists/
lists are plenty fast
and easy enough to parse. I'll just cache that. It would be neat if we
could rely on the slice specified in meta_p in the future, as in my case we
have to query meta_p.wiki regardless, but not a big deal :)
Thank you! I think I have enough information to move forward.
~ MA
On Tue, Nov 10, 2020 at 5:13 PM Brooke Storm <bstorm(a)wikimedia.org> wrote:
> Yes, you might be able to use meta_p.wiki table. However, when wikis are
> moved between sections, nothing updates the meta_p.wiki table at this time.
> Requests to
noc.wikimedia.org are accurate and up to date, as far as I
> know. We only update meta_p when we add the wiki (at least that’s how it is
> now). Also, the DNS gets synced and updated every time we run the script,
> so it is usually up-to-date. You could try meta_p.wiki and fall back to
> DNS or
noc.wikimedia.org if that fails, perhaps? Meta_p is expected to
> be on s7 in the new design.
>
> Brooke Storm
> Staff SRE
> Wikimedia Cloud Services
> bstorm(a)wikimedia.org
> IRC: bstorm
>
> On Nov 10, 2020, at 3:01 PM, MusikAnimal <musikanimal(a)gmail.com> wrote:
>
> Ah yes, 8 tunnels is more than manageable. The `slice` column in the
> meta_p.wiki table is the one we need to connect to for said wiki, right? So
> in theory, I always have SSH tunnels open for every slice, and the first
> thing I do is check meta_p.wiki for the given wiki, then I know which of
> those s1-s8 connections to use? So I really only need 8 connections (even
> in production). Maybe not what you would recommend for every tool, rather
> just the "global" ones facing this specific issue.
>
> > Can't you just tunnel to the login server and connect by hostname from
> there?
>
> Hmm I'm not sure I follow. Right now I SHH to
login.toolforge.org, but
> with "-L 4711:enwiki.analytics.db.svc.eqiad.wmflabs:3306" for port
> forwarding from my local mysql to the remote. It sounds like instead I need
> to tunnel to s1-s8, and use the correct one based on the desired database.
>
> ~ MA
>
> On Tue, Nov 10, 2020 at 4:32 PM Brooke Storm <bstorm(a)wikimedia.org> wrote:
>
>> Hi MA,
>> You could still accomplish the local environment you are describing by
>> using 8 ssh tunnels. All the database name DNS aliases go reference the
>> section names, eventually (s1, s2, s3, s4 in the form of
>> s1.analytics.db.svc.eqiad.wmflabs, etc.). An app could be written to
>> connect to the correct section instead of the database if you are doing
>> that kind of thing, but you’ll either need to make requests to
>>
https://noc.wikimedia.org/conf/dblists/s<correct-number>.dblist like
>>
https://noc.wikimedia.org/conf/dblists/s4.dblist and map things out from
>> there or perhaps check DNS for the database name and look up the “s#”
>> record from there (which is currently possible in Lua, and I can provide an
>> example of how I did it in that language).
>>
>> A mediawiki config checkout would also work besides what can be gleaned
>> from
noc.wikimedia.org.
>>
>> We can try to document some examples of how you might do it either way.
>> I’m sure it is non-trivial, but 8 tunnels is more workable than 900, at
>> least.
>>
>> Routing by reading the queries on the fly is quite tricky. The closest
>> I’ve seen ready-made tools come to that is ProxySQL, and that focuses on
>> sharding, which is not exactly the same thing.
>>
>> Brooke Storm
>> Staff SRE
>> Wikimedia Cloud Services
>> bstorm(a)wikimedia.org
>> IRC: bstorm
>>
>> On Nov 10, 2020, at 2:13 PM, MusikAnimal <musikanimal(a)gmail.com> wrote:
>>
>> Hi! Most tools query just a single db at a time, so I don't think this
>> will be a massive problem. However some such as Global
>> Contribs[0] and GUC[1] can theoretically query all of them from a single
>> request. Creating new connections on-the-fly seems doable in production,
>> the issue is how to work on these tools in a local environment. Currently
>> the recommendation is to use a SSH tunnel to the desired host,[2] such
>> as enwiki.analytics.db.svc.eqiad.wmflabs. Surely we can't do this same port
>> forwarding for 900+ connections.
>>
>> Any ideas? Perhaps there's some way to make a host that automatically
>> forwards to the correct one, solely for developer use? Or will development
>> of such global tools need to happen in the Cloud Services environment?
>>
>> ~ MA
>>
>> [0]
https://xtools.wmflabs.org/globalcontribs
>> [1]
https://guc.toolforge.org/
>> [2]
>>
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#SSH_tunneling_f…
>>
>> On Tue, Nov 10, 2020 at 3:26 PM Joaquin Oltra Hernandez <
>> jhernandez(a)wikimedia.org> wrote:
>>
>>> TLDR: Wiki Replicas' architecture is being redesigned for stability and
>>> performance. Cross database JOINs will not be available and a host
>>> connection will only allow querying its associated DB. See [1]
>>>
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>
>>> for more details.
>>>
>>> Hi!
>>>
>>> In the interest of making and keeping Wiki Replicas a stable and
>>> performant service, a new backend architecture is needed. There is some
>>> impact in the features and usage patterns.
>>>
>>> What should I do? To avoid breaking changes, you can start making the
>>> following changes *now*:
>>> - Update existing tools to ensure queries are executed against the
>>> proper database connection
>>> - Eg: If you want to query the `eswiki_p` DB, you must connect to the
>>> `eswiki.analytics.db.svc.eqiad.wmflabs` host and `eswiki_p` DB, and not to
>>> enwiki or other hosts
>>> - Check your existing tools and services queries for cross database
>>> JOINs, rewrite the joins in application code
>>> - Eg: If you are doing a join across databases, for example joining
>>> `enwiki_p` and `eswiki_p`, you will need to query them separately, and
>>> filter the results of the separate queries in the code
>>>
>>> Timeline:
>>> - November - December: Early adopter testing
>>> - January 2021: Existing and new systems online, transition period starts
>>> - February 2021: Old hardware is decommissioned
>>>
>>> We need your help
>>> - If you would like to beta test the new architecture, please let us
>>> know and we will reach out to you soon
>>> - Sharing examples / descriptions of how a tool or service was updated,
>>> writing a common solution or some example code others can utilize and
>>> reference, helping others on IRC and the mailing lists
>>>
>>> If you have questions or need help adapting your code or queries, please
>>> contact us [2]
>>>
<https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication>,
>>> or write on the talk page [3]
>>>
<https://wikitech.wikimedia.org/wiki/Talk:News/Wiki_Replicas_2020_Redesign>
>>> .
>>>
>>> We will be sending reminders, and more specific examples of the changes
>>> via email and on the wiki page. For more information see [1]
>>>
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>.
>>>
>>> [1]:
>>>
https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign
>>> [2]:
>>>
https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_communication
>>> [3]:
>>>
https://wikitech.wikimedia.org/wiki/Talk:News/Wiki_Replicas_2020_Redesign
>>>
>>> --
>>> Joaquin Oltra Hernandez
>>> Developer Advocate - Wikimedia Foundation
>>> _______________________________________________
>>> Wikimedia Cloud Services announce mailing list
>>> Cloud-announce(a)lists.wikimedia.org (formerly
>>> labs-announce(a)lists.wikimedia.org)
>>>
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
>>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>>
https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>>
>> _______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>>
https://lists.wikimedia.org/mailman/listinfo/cloud
>>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>
https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
>
https://lists.wikimedia.org/mailman/listinfo/cloud
>
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3