Hi all!
I came across a general design issue when trying to make ApiQueryLangLinks more flexible, taking into account extensions manipulating language links via the new LanguageLinks hook. To do this, I want to introduce a LangLinkLoader class with two implementations, one with the old behavior, and one that takes the hooks into account (separate because it's less efficient).
The question is now whether it is a good idea to increase the number of individual classes to improve testability. Essentially, I see two ways of doing this:
1) The "composition" approach, using:
* LangLinksLoader interface, which defines a loadLangLinks() method. * DBLangLinksLoader, which implements the current logic using a database query * HookedLangLinksLoader, which uses a DBLangLinksLoader to get the base set of links, and then applies the hooks. * LangLinkConverter, a helper for converting between database rows and the "$lang:$title" form of language links.
Code: https://gerrit.wikimedia.org/r/#/c/60034/
Advantages: * All components are testable individually; in particular: ** HookedLangLinksLoader can be tested without DBLangLinksLoader, and without a database fixture ** LangLinkConverter is testable. * LangLinksLoader's interface isn't cluttered with the converter methods * LangLinkConverter is reusable elsewhere
Disadvantages: * more classes * ???
2) The "subclassing" approach, using:
* LangLinksLoader base class, implementing the database query and protected methods for converting language links. * HookedLangLinksLoader subclasses LangLinksLoader and calls back to the parent's loadLangLinks() method to get the base set of links.
Advantages: * fewer classes * ???
Disadvantages: * HookedLangLinksLoader depends on the database logic in LangLinksLoader * HookedLangLinksLoader can not be tested without DB interaction * converter methods are not testable (or have to be public, cluttering the interface) * converter methods are not reusable elsewhere
Currently, MediaWiki core generally follows the subclassing approach; using composition instead is met with some resistance. The argument seems to be that more classes make the code less readable, harder to maintain.
I don't think that is necessarily true, though I agree that classes should not be split up needlessly.
Basically, the question is if we want to aim for true unit tests, where each component is testable independently of the others, and if we accept an increase in the number of files/classes to achieve this. Or if we want to stick with the heavy weight integrations tests we currently have, where we mainly have high level tests for API modules etc, which require complex fixtures in the database.
I think smaller classes with a clearer interface will help not only with testing, but also with maintainability, since changes are more isolated, and there is less hidden interaction using internal object state.
Is there something inherently bad about increasing the number of classes? Isn't that just a question of the tool (IDE/Editor) used? Or am I missing some other disadvantage of the composition approach?
Finally: Shall we move the code base towards a more modular design, or should we stick with the "traditional" approach?
-- daniel