Re: [Wikitech-l] Refactor of mediawiki/extensions/ArticleFeedbackv5 backend

6 Dec 2012

On Dec 5, 2012, at 12:43 PM, Patrick Reilly &lt;preilly(a)wikimedia.org&gt; wrote:

...
  Fellow Wikimedia Developers,

 Matthias Mullie has been working hard to refactor the backend of
 mediawiki/extensions/ArticleFeedbackv5 to add proper sharding support.

 The original approach that he took was to rely on RDBStore that was
 first introduced in Change-Id:
 Ic1e38db3d325d52ded6d2596af2b6bd3e9b870fe
 https://gerrit.wikimedia.org/r/#/c/16696 by Aaron Schulz.

 Asher Feldman, Tim Starling and myself reviewed the new class RDBStore
 and determined that it wasn't really the best approach for our current
 technical architecture and database environment. Aaron Schulz had a
 lot of really good ideas included in RDBStore, but it just seemed like
 it wasn't a great fit right now. We decided collectively to abandon
 the RDBStore work permanently at this time. 
	:-( I'm going through all the stages of grief right now. In a few moments, I'll
hit "acceptance"

...
  So, we're now left with the need to provide
Matthias Mullie with some
 direction on what is the best solution for the ArticleFeedbackv5
 refactor.

 One possible solution would be to create a new database cluster for
 this type of data. This cluster would be solely for data that is
 similar to Article Feedback's and that has the potential of being
 spammy in nature. The MediaWiki database abstraction layer could be
 used directly via a call to the wfGetDB() function to retrieve a
 Database object. A read limitation with this approach will be
 particularly evident when we require a complex join. We will need to
 eliminate any cross-shard joins. 
	This seems like the only reasonable solution that can be done in a timely manner at the
moment.

	I caution against making this sort of vertical partitioning a long-term solution.
Knowledge of what data lives on what machine is the sort of human knowledge created by
heterogeneous systems that is prone to failure. All "Social" style data tends to
proliferate fast and ArticleFeedback is just one such piece. Tying ourselves to a vertical
partition that grows at Moore's law is going to bite us when we hit things like Flow
(messages), LQT3, etc.

	Cross-shard JOINs are already eliminated (or should be) in AFTv5 patch since it assumes
RDB, so there shouldn't be a call in the code that requires a wfGetDB() to return the
same database object for AFT-related tables and non-AFT-related ones.

...
  The reality is that Database Sharding is a very useful
technology, but
 like other approaches, there are many factors to consider that ensure
 a successful implementation. Further, there are some limitations and
 Database Sharding will not work well for every type of application. 
	Most of this is alleviated with increased dependence on memcache for caching intermediate
values and rollups. Since this isn't handled on the object level in mediawiki, I
assumed this is a problem for the AFTv5 patch and not RDB store.

...
  So, to this point when we truly implement sharding in
the future it
 will more than likely be benificial to focus on place in core
 mediawiki where it will have the greatest impact, such as the
 pagelinks and revision tables. 
	Yes, it'd have the greatest impact here, but these are tables with some of the most
indexes/rollups/joins.

	To do this properly we'd need a core object baseclass that would allow us to use
BagOfStuff/memcached as a passthrough on anything going to the shard so that joins are
done in PHP and reads only from memcache (unless the data requested has been LRU'd).

	That's a much larger undertaking.

	Take care,

		terry

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Refactor of mediawiki/extensions/ArticleFeedbackv5 backend