Using composition to improve testability? - Wikitech-l

2 May 2013


      Hi all!
I came across a general design issue when trying to make ApiQueryLangLinks more
flexible, taking into account extensions manipulating language links via the new
LanguageLinks hook. To do this, I want to introduce a LangLinkLoader class with
two implementations, one with the old behavior, and one that takes the hooks
into account (separate because it's less efficient).
The question is now whether it is a good idea to increase the number of
individual classes to improve testability. Essentially, I see two ways of doing
this:
1) The "composition" approach, using:
* LangLinksLoader interface, which defines a loadLangLinks() method.
* DBLangLinksLoader, which implements the current logic using a database query
* HookedLangLinksLoader, which uses a DBLangLinksLoader to get the base set of
links, and then applies the hooks.
* LangLinkConverter, a helper for converting between database rows and the
"$lang:$title" form of language links.
Code: https://gerrit.wikimedia.org/r/#/c/60034/
Advantages:
* All components are testable individually; in particular:
** HookedLangLinksLoader can be tested without DBLangLinksLoader, and without a
database fixture
** LangLinkConverter is testable.
* LangLinksLoader's interface isn't cluttered with the converter methods
* LangLinkConverter is reusable elsewhere
Disadvantages:
* more classes
* ???
2) The "subclassing" approach, using:
* LangLinksLoader base class, implementing the database query and protected
methods for converting language links.
* HookedLangLinksLoader subclasses LangLinksLoader and calls back to the
parent's loadLangLinks() method to get the base set of links.
Advantages:
* fewer classes
* ???
Disadvantages:
* HookedLangLinksLoader depends on the database logic in LangLinksLoader
* HookedLangLinksLoader can not be tested without DB interaction
* converter methods are not testable (or have to be public, cluttering the
interface)
* converter methods are not reusable elsewhere
Currently, MediaWiki core generally follows the subclassing approach; using
composition instead is met with some resistance. The argument seems to be that
more classes make the code less readable, harder to maintain.
I don't think that is necessarily true, though I agree that classes should not
be split up needlessly.
Basically, the question is if we want to aim for true unit tests, where each
component is testable independently of the others, and if we accept an increase
in the number of files/classes to achieve this. Or if we want to stick with the
heavy weight integrations tests we currently have, where we mainly have high
level tests for API modules etc, which require complex fixtures in the database.
I think smaller classes with a clearer interface will help not only with
testing, but also with maintainability, since changes are more isolated, and
there is less hidden interaction using internal object state.
Is there something inherently bad about increasing the number of classes? Isn't
that just a question of the tool (IDE/Editor) used? Or am I missing some other
disadvantage of the composition approach?
Finally: Shall we move the code base towards a more modular design, or should we
stick with the "traditional" approach?
-- daniel