Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

17 Nov 2020

Moving the joins to the application layer definitely makes things quite
complex compared to an SQL query.

Having a data lake or other solutions like you mention makes it more
feasible to do these kinds of joins with big data, but it also usually
requires careful schema and index design when moving the data to it for the
queries to be performant. In these cases you would also lose the
flexibility of arbitrarily querying the DB like the replicas provide
currently, so in the end there would be a different set of tradeoffs. It is
important to understand what things are truly not doable with existing
tools and services, so that something like this can be considered for
filling the gaps if necessary.

Currently the focus is keeping the replicas stable, maintainable and
performant, so this work must happen soon.

On Wed, Nov 11, 2020 at 7:59 AM &lt;xover(a)pobox.com&gt; wrote:

...
  On Wed, Nov 11, 2020 at 5:26 AM AntiCompositeNumber
 &lt;anticompositenumber(a)gmail.com&gt; wrote:
  I understand the system engineering reasons for
this change, but I
 think it's worth underscoring exactly how disruptive it will be for
 the queries that depended on this functionality. 
 The use cases seem to be relatively few and relatively limited. Could
 this perhap be a good case for a data mart (ETL) or meta index style
 approach? I'm thinking of things like CloverDX and Jaspersoft ETL, or
 even Apache Solr or another non-SQL solution.

 Moving JOINs up the stack from the SQL layer to the application layer
 does not sound like an architecturally sound approach.

 Cheers,
 Xover

 _______________________________________________
 Wikimedia Cloud Services mailing list
 Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/cloud

-- 
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign