Moving the joins to the application layer definitely makes things quite complex compared to an SQL query.

Having a data lake or other solutions like you mention makes it more feasible to do these kinds of joins with big data, but it also usually requires careful schema and index design when moving the data to it for the queries to be performant. In these cases you would also lose the flexibility of arbitrarily querying the DB like the replicas provide currently, so in the end there would be a different set of tradeoffs. It is important to understand what things are truly not doable with existing tools and services, so that something like this can be considered for filling the gaps if necessary.

Currently the focus is keeping the replicas stable, maintainable and performant, so this work must happen soon.

On Wed, Nov 11, 2020 at 7:59 AM <> wrote:
On Wed, Nov 11, 2020 at 5:26 AM AntiCompositeNumber
<> wrote:
> I understand the system engineering reasons for this change, but I
> think it's worth underscoring exactly how disruptive it will be for
> the queries that depended on this functionality.

The use cases seem to be relatively few and relatively limited. Could
this perhap be a good case for a data mart (ETL) or meta index style
approach? I'm thinking of things like CloverDX and Jaspersoft ETL, or
even Apache Solr or another non-SQL solution.

Moving JOINs up the stack from the SQL layer to the application layer
does not sound like an architecturally sound approach.


Wikimedia Cloud Services mailing list (formerly

Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation