Moving the joins to the application layer definitely makes things quite
complex compared to an SQL query.
Having a data lake or other solutions like you mention makes it more
feasible to do these kinds of joins with big data, but it also usually
requires careful schema and index design when moving the data to it for the
queries to be performant. In these cases you would also lose the
flexibility of arbitrarily querying the DB like the replicas provide
currently, so in the end there would be a different set of tradeoffs. It is
important to understand what things are truly not doable with existing
tools and services, so that something like this can be considered for
filling the gaps if necessary.
Currently the focus is keeping the replicas stable, maintainable and
performant, so this work must happen soon.
On Wed, Nov 11, 2020 at 7:59 AM <xover(a)pobox.com> wrote:
On Wed, Nov 11, 2020 at 5:26 AM AntiCompositeNumber
<anticompositenumber(a)gmail.com> wrote:
I understand the system engineering reasons for
this change, but I
think it's worth underscoring exactly how disruptive it will be for
the queries that depended on this functionality.
The use cases seem to be relatively few and relatively limited. Could
this perhap be a good case for a data mart (ETL) or meta index style
approach? I'm thinking of things like CloverDX and Jaspersoft ETL, or
even Apache Solr or another non-SQL solution.
Moving JOINs up the stack from the SQL layer to the application layer
does not sound like an architecturally sound approach.
Cheers,
Xover
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation