Fellow Wikimedia Developers,
Matthias Mullie has been working hard to refactor the backend of
mediawiki/extensions/ArticleFeedbackv5 to add proper sharding support.
The original approach that he took was to rely on RDBStore that was
first introduced in Change-Id:
Ic1e38db3d325d52ded6d2596af2b6bd3e9b870fe
https://gerrit.wikimedia.org/r/#/c/16696 by Aaron Schulz.
Asher Feldman, Tim Starling and myself reviewed the new class RDBStore
and determined that it wasn't really the best approach for our current
technical architecture and database environment. Aaron Schulz had a
lot of really good ideas included in RDBStore, but it just seemed like
it wasn't a great fit right now. We decided collectively to abandon
the RDBStore work permanently at this time.
So, we're now left with the need to provide Matthias Mullie with some
direction on what is the best solution for the ArticleFeedbackv5
refactor.
One possible solution would be to create a new database cluster for
this type of data. This cluster would be solely for data that is
similar to Article Feedback's and that has the potential of being
spammy in nature. The MediaWiki database abstraction layer could be
used directly via a call to the wfGetDB() function to retrieve a
Database object. A read limitation with this approach will be
particularly evident when we require a complex join. We will need to
eliminate any cross-shard joins.
The reality is that Database Sharding is a very useful technology, but
like other approaches, there are many factors to consider that ensure
a successful implementation. Further, there are some limitations and
Database Sharding will not work well for every type of application.
So, to this point when we truly implement sharding in the future it
will more than likely be benificial to focus on place in core
mediawiki where it will have the greatest impact, such as the
pagelinks and revision tables.
— Patrick