Re: [Wikitech-l] Refactor of mediawiki/extensions/ArticleFeedbackv5 backend

6 Dec 2012


      On Dec 5, 2012, at 12:43 PM, Patrick Reilly preilly@wikimedia.org wrote:
...
Fellow Wikimedia Developers,
Matthias Mullie has been working hard to refactor the backend of
mediawiki/extensions/ArticleFeedbackv5 to add proper sharding support.
The original approach that he took was to rely on RDBStore that was
first introduced in Change-Id:
Ic1e38db3d325d52ded6d2596af2b6bd3e9b870fe
https://gerrit.wikimedia.org/r/#/c/16696 by Aaron Schulz.
Asher Feldman, Tim Starling and myself reviewed the new class RDBStore
and determined that it wasn't really the best approach for our current
technical architecture and database environment. Aaron Schulz had a
lot of really good ideas included in RDBStore, but it just seemed like
it wasn't a great fit right now. We decided collectively to abandon
the RDBStore work permanently at this time.
:-( I'm going through all the stages of grief right now. In a few moments, I'll hit "acceptance"
...
So, we're now left with the need to provide Matthias Mullie with some
direction on what is the best solution for the ArticleFeedbackv5
refactor.
One possible solution would be to create a new database cluster for
this type of data. This cluster would be solely for data that is
similar to Article Feedback's and that has the potential of being
spammy in nature. The MediaWiki database abstraction layer could be
used directly via a call to the wfGetDB() function to retrieve a
Database object. A read limitation with this approach will be
particularly evident when we require a complex join. We will need to
eliminate any cross-shard joins.
This seems like the only reasonable solution that can be done in a timely manner at the moment.
I caution against making this sort of vertical partitioning a long-term solution. Knowledge of what data lives on what machine is the sort of human knowledge created by heterogeneous systems that is prone to failure. All "Social" style data tends to proliferate fast and ArticleFeedback is just one such piece. Tying ourselves to a vertical partition that grows at Moore's law is going to bite us when we hit things like Flow (messages), LQT3, etc.
Cross-shard JOINs are already eliminated (or should be) in AFTv5 patch since it assumes RDB, so there shouldn't be a call in the code that requires a wfGetDB() to return the same database object for AFT-related tables and non-AFT-related ones.
...
The reality is that Database Sharding is a very useful technology, but
like other approaches, there are many factors to consider that ensure
a successful implementation. Further, there are some limitations and
Database Sharding will not work well for every type of application.
Most of this is alleviated with increased dependence on memcache for caching intermediate values and rollups. Since this isn't handled on the object level in mediawiki, I assumed this is a problem for the AFTv5 patch and not RDB store.
...
So, to this point when we truly implement sharding in the future it
will more than likely be benificial to focus on place in core
mediawiki where it will have the greatest impact, such as the
pagelinks and revision tables.
Yes, it'd have the greatest impact here, but these are tables with some of the most indexes/rollups/joins.
To do this properly we'd need a core object baseclass that would allow us to use BagOfStuff/memcached as a passthrough on anything going to the shard so that joins are done in PHP and reads only from memcache (unless the data requested has been LRU'd).
That's a much larger undertaking.
Take care,
terry

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Refactor of mediawiki/extensions/ArticleFeedbackv5 backend