over a path of effort for the Clouds team
It seems to me that the Cloud team is putting in all of the effort they
can. I'm not sure where they would find more time and energy to implement
a better solution. I imagine any better solution wouldn't be a matter of a
few extra hours, but rather finding thousands and thousands of hours to
build something bespoke.
This is painful. I think you raised some really good points about
cross-joins with Central Auth and Commons as those are *designed* to be
cross-referenced from other wikis. But ultimately, if there's no
reasonable way to do it in the software (Maria DB) we have available,
implementing our own solution would take several orders of magnitude more
time and then we'd need even more time to maintain it.
The real source of pain here is the success of Wikimedia projects. Once
things get so big that you can't really use *one bigass server *to solve
the problem anymore, scaling involves taking on new complexities in
downstream code. In my experience, everyone is struggling with the limits
that the "big data" age has put on our ability to query and analyze. No
matter what, this type of transition and likely the ones that will follow,
will cause a big burden on tool developers. I just don't see a good way
around that even though it is a very bad situation and many volunteers
won't be able to handle the burden.
I guess all I'm trying to say is, don't lay this on the Cloud team being
lazy. They aren't. They are one of the most volunteer focused teams at
the Wikimedia Foundation and they do quite a lot to support us with the
little resources they have. Your frustrations are perfectly valid though.
This is a very frustrating situation.
On Wed, Mar 31, 2021 at 7:01 AM Huji Lee <huji.huji(a)gmail.com> wrote:
I said it before, and I say it again: *some* databases
should be
available for cross-wiki JOIN everywhere. This would at least include
commons_p and centralauth_p but perhaps also enwiki_p and meta_p
I know that we discussed it before and better long-term solutions can be
imagined (such as a data lake, etc.) but we need a solution *now*.
Sorry for coming at it a bit passionately. It just feels like we are
choosing a path of pain for end users (let their tools break, then let us
offer alternatives that they could adopt) over a path of effort for the
Clouds team (let them create robust solutions for the end users and give
them ample time to transition to the new method before turning off
cross-wiki joins).
On Wed, Mar 31, 2021 at 4:57 AM Fastily <fastilywp(a)gmail.com> wrote:
A little late to the party, I just learned about
this change today.
I maintain a number of bot tasks
<https://en.wikipedia.org/wiki/User:FastilyBot> and database
<https://fastilybot-reports.toolforge.org> reports
<https://en.wikipedia.org/wiki/Wikipedia:Database_reports> on enwp that
rely on cross-wiki joins (mostly page title joins between enwp and Commons)
to function properly. I didn't find the migration instructions
<https://wikitech.wikimedia.org/w/index.php?title=News/Wiki_Replicas_2020_Redesign&oldid=1905818#How_do_I_cross_reference_data_between_wikis_like_I_do_with_cross_joins_today?>
very helpful; I run FastilyBot on a Raspberry Pi, and needless to say it
would be grossly impractical for me to perform a "join" in the bot's code.
Is there going to be a replacement for this functionality?
Fastily
On Mon, Mar 15, 2021 at 3:09 PM Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
[4] was made to figure out common use cases and
possibilities to enable
them again.
...
I just want to highlight this ^ thing Joaquin said and mention that our
team (Data Engineering) is also participating in brainstorming ways to
bring back not just cross-wiki joins but better datasets to run these
queries. We have some good ideas, so please do participate in the task and
give us more input so we can pick the best solution quickly.
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud