Hey,
in order to sanity check the code we have written in the Wikidata project, we have asked an external company to review our code and discuss it with the team. The effort was very instructional for us.
We want to share the results with you. The report looks dauntingly big, but this is mostly due to the appendixes. The first 20 pages are quite worth a read.
Since the Wikidata code is an extension to MediaWiki, a number of the issues raised are also relevant to the MediaWiki code proper. I would hope that this code review can be regarded as a contribution towards the discussion of the status of our shared code-base as well.
I will unfortunately not be at the Hackathon, but a number of Wikidata developers will, please feel free to chat them up. Daniel Kinzler is also preparing a presentation to discuss a few lessons learned and ideas at the Hackathon.
The review is available through this page: < http://meta.wikimedia.org/wiki/Wikidata/Development/Code_Review%3E
Cheers, Denny
On 17 May 2013 16:57, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
Hey,
in order to sanity check the code we have written in the Wikidata project, we have asked an external company to review our code and discuss it with the team. The effort was very instructional for us.
We want to share the results with you. The report looks dauntingly big, but this is mostly due to the appendixes. The first 20 pages are quite worth a read.
I read half of the report. Here is what I got out of it:
* We should do dependency injection to make code testable * We should adhere to the single responsibility principle to make code testable and less complex * MediaWiki in many places (special pages, api, hooks to name few) makes it impossible or at least very hard to do the above for most classes
They proposed a solution (work around in Wikibase) to allow writing better code in Wikibase. In section 4.1.6 it reads "The sketched approach fulfills all presented requirements: The dependency injection mechanism is kept simple and easy to understand". But as a reader I did not understand how it works or what would be the practical gains for the code complexity that the solution itself introduces. I would attribute some part of this to the many typos, mistakes and non-fluent language in the text and examples. One example:
"For example, if the a DependencyManager as configured in Illustration 1: DependencyManager is asked to retrieve the object with the key “mail”, it will ask the registered DatabaseBuilder to create that object and return it."
I am curious whether there was any discussions if and how these issues could be fixed in MediaWiki core? Or do we need[1] to figure that out on our own?
[1] I see that I already jumped on the "how to get there" part, but I guess I should first ask: is there sufficient mass that thinks there is an issue and that the issue should be fixed? I think that there is definitely a need for core to adopt better coding practices to reduce code complexity and allow easier unit testing. Just like the move from svn to git and gerrit, it will need a lot of effort and can annoy people, but in the end we will be better than before.
-Niklas
-- Niklas Laxström
Hi,
On Sat, May 18, 2013 at 11:11:19AM +0300, Niklas Laxström wrote:
They proposed a solution (work around in Wikibase) to allow writing better code in Wikibase. [...] But as a reader I did not understand how it works
I was also surprised to realize that although the report constantly talks about a lack of dependency injection {,mechanism,facility} (in their meaning), their solution is not dependency injection (in the common meaning), but a service locator (in the common meaning) :-)
Seeing a separate service locator in a friendly extension comes even as a greater surprise to me.
Although it's already close to 10 years old, to me, Martin Fowler's article on dependency injection [1] is still a great primer on “how it works”. The article provides nice short code snippets (Java, but they should be easily readable by any PHP developer) and diagrams on how things get created in different scenarios.
or what would be the practical gains for the code complexity that the solution itself introduces.
I am probably biased, but for me, implementing only the service locator solution they propose, comes with little gain.
Service locators are “only” a tool assisting in decoupling direct dependencies between objects. That's good already, but service locators do not help with composition of objects, which is most of the work. They also do not help/allow to read off dependencies from objects (as for example constructor injection would).
Additionally, MediaWiki already partly uses service locators (Getting a Database connection). Nothing to gain in such parts.
Through service locators' additional indirection, they slow down code a bit, but are a first step to allow for easier testing.
The real paradigm shift would be to go for dependency injection (in the common meaning) and start using an inversion of control container.
* What would that cost us? Some performance, due to the decoupled dependencies.
* What would that buy us? Testable code.
Example? Example. When implementing a unit test for a plain project renaming in gerrit, I started stubbing the required objects and the test reached ~1000 lines of code (a single test! :-( ) although I was only 3/4 through with setting up the required stubs.
This insane amount of code for a single test did not come as a surprise. After all, the function under test had to move the project around in the database, move it in the file system, add git commits to child projects, ... and to test in isolation, you'll just have to stub all that out.
And every further test of a different path through the same method, would mean adding ~1300 lines of code to the test case.
So, since gerrit already embraces dependency injection, IoC, and coding by contract, I could break the project renaming code into smaller objects, and let the IoC container do the composition on its own. Thereby, testing each of the parts in full isolation is easier. And variations in code paths can be tested right where they are. It's no longer needed to test the whole code as one blob.
For this concrete example, testing only the top-level method (assuming sub-tasks meet the contract) requires 16 tests. For the unrefactored code, that'd be ~20k unmaintainable lines for the tests alone. After the refactoring for dependency injection and IoC, the whole file of the test case including header, imports, ... (and the tests) is ~0.5k lines.
Of course, we can always refactor code even without dependency injection or IoC containers. Everyone is doing that :-) “Dependency Injection” is just the hip name for a series of patterns that allow to decouple objects. And IoC containers just make it easy to stitch them back together at run-time.
Best regards, Christian
[1] "Inversion of Control Containers and the Dependency Injection pattern". Available at http://www.martinfowler.com/articles/injection.html
wikitech-l@lists.wikimedia.org