I am not being critical of people (namely, the amazing Cloud team) here. I am being critical of decisions. That could even involve much higher level decisions e.g. should WMF have spent more money and hired more resources for this? It could very well be that I am uninformed, and these decisions were made very carefully and are the best option for WMF, the Cloud team and the users; but I did not get that impression from our previous conversations on this topic.

If you were to just look at this internally as the Clouds team (and I realize it is not my place to impersonate them), the tradeoff could also be viewed as: do we delay turning off cross-joins and make our employer mad (because we are delaying something we promised), or do we move forward and make some of our users mad (because we are breaking their tools and not offering a solid alternative). This is a classic middle-manager dilemma, and I feel sorry for anyone that is stuck between a rock and hard place like that.

By the way, I cannot help myself from highlighting how funny the oxymoron is in this sentence: The real source of pain here is the success of Wikimedia projects. I might start quoting that going forward!


On Wed, Mar 31, 2021 at 10:59 AM Aaron Halfaker <aaron.halfaker@gmail.com> wrote:
over a path of effort for the Clouds team

It seems to me that the Cloud team is putting in all of the effort they can.  I'm not sure where they would find more time and energy to implement a better solution.  I imagine any better solution wouldn't be a matter of a few extra hours, but rather finding thousands and thousands of hours to build something bespoke. 

This is painful.  I think you raised some really good points about cross-joins with Central Auth and Commons as those are *designed* to be cross-referenced from other wikis.  But ultimately, if there's no reasonable way to do it in the software (Maria DB) we have available, implementing our own solution would take several orders of magnitude more time and then we'd need even more time to maintain it. 

The real source of pain here is the success of Wikimedia projects.  Once things get so big that you can't really use one bigass server to solve the problem anymore, scaling involves taking on new complexities in downstream code.  In my experience, everyone is struggling with the limits that the "big data" age has put on our ability to query and analyze.  No matter what, this type of transition and likely the ones that will follow, will cause a big burden on tool developers.  I just don't see a good way around that even though it is a very bad situation and many volunteers won't be able to handle the burden.

I guess all I'm trying to say is, don't lay this on the Cloud team being lazy.  They aren't.  They are one of the most volunteer focused teams at the Wikimedia Foundation and they do quite a lot to support us with the little resources they have.  Your frustrations are perfectly valid though.  This is a very frustrating situation.

On Wed, Mar 31, 2021 at 7:01 AM Huji Lee <huji.huji@gmail.com> wrote:
I said it before, and I say it again: some databases should be available for cross-wiki JOIN everywhere. This would at least include commons_p and centralauth_p but perhaps also enwiki_p and meta_p

I know that we discussed it before and better long-term solutions can be imagined (such as a data lake, etc.) but we need a solution now.

Sorry for coming at it a bit passionately. It just feels like we are choosing a path of pain for end users (let their tools break, then let us offer alternatives that they could adopt) over a path of effort for the Clouds team (let them create robust solutions for the end users and give them ample time to transition to the new method before turning off cross-wiki joins).

On Wed, Mar 31, 2021 at 4:57 AM Fastily <fastilywp@gmail.com> wrote:
A little late to the party, I just learned about this change today.

I maintain a number of bot tasks and database reports on enwp that rely on cross-wiki joins (mostly page title joins between enwp and Commons) to function properly.  I didn't find the migration instructions very helpful; I run FastilyBot on a Raspberry Pi, and needless to say it would be grossly impractical for me to perform a "join" in the bot's code.

Is there going to be a replacement for this functionality?

Fastily

On Mon, Mar 15, 2021 at 3:09 PM Dan Andreescu <dandreescu@wikimedia.org> wrote:
[4] was made to figure out common use cases and possibilities to enable them again.
...

I just want to highlight this ^ thing Joaquin said and mention that our team (Data Engineering) is also participating in brainstorming ways to bring back not just cross-wiki joins but better datasets to run these queries.  We have some good ideas, so please do participate in the task and give us more input so we can pick the best solution quickly.
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud