Hi all!
I came across a general design issue when trying to make ApiQueryLangLinks more flexible, taking into account extensions manipulating language links via the new LanguageLinks hook. To do this, I want to introduce a LangLinkLoader class with two implementations, one with the old behavior, and one that takes the hooks into account (separate because it's less efficient).
The question is now whether it is a good idea to increase the number of individual classes to improve testability. Essentially, I see two ways of doing this:
1) The "composition" approach, using:
* LangLinksLoader interface, which defines a loadLangLinks() method. * DBLangLinksLoader, which implements the current logic using a database query * HookedLangLinksLoader, which uses a DBLangLinksLoader to get the base set of links, and then applies the hooks. * LangLinkConverter, a helper for converting between database rows and the "$lang:$title" form of language links.
Code: https://gerrit.wikimedia.org/r/#/c/60034/
Advantages: * All components are testable individually; in particular: ** HookedLangLinksLoader can be tested without DBLangLinksLoader, and without a database fixture ** LangLinkConverter is testable. * LangLinksLoader's interface isn't cluttered with the converter methods * LangLinkConverter is reusable elsewhere
Disadvantages: * more classes * ???
2) The "subclassing" approach, using:
* LangLinksLoader base class, implementing the database query and protected methods for converting language links. * HookedLangLinksLoader subclasses LangLinksLoader and calls back to the parent's loadLangLinks() method to get the base set of links.
Advantages: * fewer classes * ???
Disadvantages: * HookedLangLinksLoader depends on the database logic in LangLinksLoader * HookedLangLinksLoader can not be tested without DB interaction * converter methods are not testable (or have to be public, cluttering the interface) * converter methods are not reusable elsewhere
Currently, MediaWiki core generally follows the subclassing approach; using composition instead is met with some resistance. The argument seems to be that more classes make the code less readable, harder to maintain.
I don't think that is necessarily true, though I agree that classes should not be split up needlessly.
Basically, the question is if we want to aim for true unit tests, where each component is testable independently of the others, and if we accept an increase in the number of files/classes to achieve this. Or if we want to stick with the heavy weight integrations tests we currently have, where we mainly have high level tests for API modules etc, which require complex fixtures in the database.
I think smaller classes with a clearer interface will help not only with testing, but also with maintainability, since changes are more isolated, and there is less hidden interaction using internal object state.
Is there something inherently bad about increasing the number of classes? Isn't that just a question of the tool (IDE/Editor) used? Or am I missing some other disadvantage of the composition approach?
Finally: Shall we move the code base towards a more modular design, or should we stick with the "traditional" approach?
-- daniel
On Thu, May 2, 2013 at 9:36 AM, Daniel Kinzler daniel@brightbyte.de wrote:
- The "composition" approach, using:
[...]
Disadvantages:
- more classes
- ???
* A lot of added complexity
- The "subclassing" approach, using:
[...]
3) Instead of making a bunch of one-public-method classes used only by ApiQueryLangLinks, just put the logic in methods of ApiQueryLangLinks.
Advantages: * Everything is in one file, not lost in a maze of twisty little classes. * These methods could still be written in a "composition" style to be individually unit tested (possibly after using reflection to set them public[1]), if necessary.
Disadvantages: * If someone comes up with someplace to reuse this, it would need to be refactored then.
[1]: http://sebastian-bergmann.de/archives/881-Testing-Your-Privates.html
On 02.05.2013 16:12, Brad Jorsch wrote:
On Thu, May 2, 2013 at 9:36 AM, Daniel Kinzler daniel@brightbyte.de wrote:
- The "composition" approach, using:
[...]
Disadvantages:
- more classes
- ???
- A lot of added complexity
The the number of classes, and the object graph, some. Not in the code though. In my experience, this makes for less complex (and less surprising) code, because it enforces interfaces.
But I agree that we should indeed take care though that we don't end up with a maze of factories, builders, etc.
To be honest, I was (and to some extend, still am) reluctant to fully adopt the composition style, mainly for this reason: more classes and more objects means more complexity. But I have come to see that a) for testability, this is simply necessary and b) the effect on the actual code is rather positive: smaller methods, less internal state, clearer interfaces.
- The "subclassing" approach, using:
[...]
- Instead of making a bunch of one-public-method classes used only by
ApiQueryLangLinks, just put the logic in methods of ApiQueryLangLinks.
Advantages:
- Everything is in one file, not lost in a maze of twisty little classes.
"everything in one file" seems like a disadvantage to me, at least if the things in that file are not very strongly related. Obvious examples of this being a problem are classes like Title or Article.
But you bring the question to a point: does the increased granularity needed for proper unit testing necessarily lead to a "maze" of classes, or to cleaner classes and a cleaner object structure? Of course, this, like everything, *can* be overdone.
But maybe it's a question of tools. When I used a simple editor for MediaWiki development, finding and opening the next file was time consuming an annoying. Since I have moved to a full featured IDE for MediaWiki development, having many files and classes has become a non-issue, because navigation is seamless. I don't care what file needs to be opened, i can just click or enter a class/function name to navigate to the declaration.
- These methods could still be written in a "composition" style to be
individually unit tested (possibly after using reflection to set them public[1]), if necessary.
* ..."could still be written in a "composition" style" - I don't see how I could test the load-with-hooks code without using the load-from-db code, unless the load-with-hooks method takes a callback as an argument. Which to me seems like adding complexity and cluttering the interface. Or we could rely on a big if/then/else, which essentially doubles the number of code paths to test.
* ..."using reflection to set them public". I guess that's an option... is it good practice? Should we make it a general principle?
Disadvantages:
- If someone comes up with someplace to reuse this, it would need to
be refactored then.
Or rewrite them, because it's not easy to find where such utility code might already exist...
-- daniel
wikitech-l@lists.wikimedia.org