Hi again.
For the wikibase client components, we need unit tests for components that access another wiki's database - e.g. a Wikipedia would need to access Wikidata's DB to find out which data item is associated with which Wikipedia page.
The LoadBalancer class has some support for this, and I have integrated this in DBAccessBase and ORMTable. This makes it easy enough to write classes that access another wiki's database. But. How do I test these?
I would need to set up a second database (or at least, a Database object that uses a different table prefix from the one used for the "normal" temporary testing tables). The schema of that other database may differ from the the local wiki's: it may contain some tables that don't exist locally, e.g. wb_changes on the Wikibase repository. And, in case we are not using transient temp tables, this extra database schema needs to be removed again once the tests are done.
Creating a set of tables using a different table prefix from the normal one (which, under test, is "unittest_") seems doable. But this has to behave like a foreign wiki with respect to the load balancer: if my extra db schema is called "repowiki", emulating a database (and a wiki) called repowiki, but really just using the table prefix "unittest_repowiki_" - how do I make sure I get the appropriate LoadBalancer for wfGetLB( "repowiki "), and the correct connection from $lb->getConnection( DB_MASTER, array(), "repowiki" )?
It seems like the solution is to implement an LBFactory and LoadBalancer class to take care of this. But I'm unsure on the details. Also... how does the new LB interact with the existing LB? Would it just repolace it, or wrap & delegate? Or what?
Any ideas how to best do this?
-- daniel
Hi Daniel,
sorry for chiming in late. But better a late response than none ;-)
On Tue, Dec 04, 2012 at 10:54:08AM +0100, Daniel Kinzler wrote:
[ LoadBalancer, DBAccessBase, and ORMTable integration help when developing components that access another wiki's database. ] But. How do I test these?
In the further parts of your email, you exposed some of the structural problems we face when trying to setup the required databases, ... at run^Wtesttime.
However, we actually do not need those databases and tables for testing. For testing, it would be sufficient to have mock database objects [1] that pretend that there are underlying databases, tables, etc. Those mocks would return the very values that our database objects would return if those databases, tables, etc existed.
Thereby, we could skip the hassles of setting up a second database or additional tables etc. Yippie! We'd just have to initialize the mock.
Best regards, Christian
[1] I am well aware of sounding like a broken record, as I already suggested mocking the database some time ago when discussing performance issues for tests https://lists.wikimedia.org/pipermail/wikitech-l/2012-June/061469.html But this reference is just to show that going down the database mocking road would solve both problems at once.
Hi Christian
On 08.12.2012 22:16, Christian Aistleitner wrote:
However, we actually do not need those databases and tables for testing. For testing, it would be sufficient to have mock database objects [1] that pretend that there are underlying databases, tables, etc.
Hm... so, if that mock DB object is used on the code, which tried to execute an SQL query against it, it will work? Sounds like that should at least be an in-memory sqlite instance... The trouble is, we do *not* have an abstraction layer on top of SQL. We just have one for different SQL implementations. To abstract from SQL, we'd need a full DAO layer. We don't have that.
Anyway: even if that works, one reason not to do it would be the ability to test against different database engines. The PostGres people are quite keen on that. But I suppose that could remain as an optional feature.
Also: once I have a mock object, how do I inject it into the load balancer/LBFactory, so wfGetLB( 'foo' )->getConnection( DB_SLAVE, null, 'foo' ) will return the correct mock object (one one for wiki 'foo')? Global state is evil...
If you could help me to answer that last question, that would already help me a lot...
thanks daniel
Do you really need SQL access to wikidata? I would expect your code to go through a WikidataClient class, which could then connected to wikidata by sql, http, loading from a local file...
On 09.12.2012 00:50, Platonides wrote:
Do you really need SQL access to wikidata? I would expect your code to go through a WikidataClient class, which could then connected to wikidata by sql, http, loading from a local file...
Sure, but then I can't tests the code that does the direct cross-wiki database access :)
-- daniel
Hi Daniel,
On Sat, Dec 08, 2012 at 10:29:23PM +0100, Daniel Kinzler wrote:
On 08.12.2012 22:16, Christian Aistleitner wrote:
[ database mocking ]
Hm... so, if that mock DB object is used on the code, which tried to execute an SQL query against it, it will work?
Yes.
Sounds like that should at least be an in-memory sqlite instance...
Sorry, my fault. It shouldn't have sounded like that :-) We could base a mock on an in-memory SQLite instance. But that would somehow defeat the purpose of using plain, simple mocks. Basically, you'd want to keep mocks as dumbed down as you can. Thereby, the mocks are easy to maintain, easy to understand and are somewhat less likely to break.
Just think of the mock as a POPO that you feed some expected input (SQL queries) and some expected response to this input (Response from database). The mock just plays this output back to the component under test and the mock additionally does some accounting.
There is no real database backing the mock.
Maybe that problem is somewhat burried in what we mean by “unit” testing, and that we do not take advantage of the programming by contract paradigm. Consider a class A, that uses a class B, which in term uses a class C that relies on a D. So something like
A -> B -> C -> D
. Think of A ... some user request B ... some database accessing code C ... Load Balancer D ... MediaWiki Database implementation
Our current approach to “unit” test B is creating some instance of B and poking it with sticks, to see how it reacts. So when we're “unit” testing B, we effectively test all the classes in the dependency chain from B downwards:
B -> C -> D
But that's what is typically called /integration/ testing B. Plain unit testing B would come down to providing a stub/mock C' for C and then test B after injecting this C' instead of C. So something along the lines of
B -> C'
. Here you see that C' does not depend on D -- the MediaWiki Database implementation. Yippie, easier test code :-D
So at the end, when mocking the database, there is no need for an “in-memory sqlite instance”.
The trouble is, we do *not* have an abstraction layer on top of SQL. We just have one for different SQL implementations.
While more abstraction never hurts for testing, the abstraction that we currently have is sufficient.
Anyway: even if that works, one reason not to do it would be the ability to test against different database engines. The PostGres people are quite keen on that. But I suppose that could remain as an optional feature.
When talking about A, B, C, D above, the example was just about unit testing B. C (C') was the (mocked) Load Balancer. Of course we also want to unit test all implementations of C (providing mocks for D). We also want to unit test the MySQL Database implementation, just as the PostgreSQL, just as the SQLite, ...
But the programming by contract paradigm (which MediaWiki follows in that part of the code) says that we can rely on the abstraction of the DatabaseType interface / DatabaseBase class.
So let's embrace this change. Test against the contract that the interface provides. Then we know that each of the MySQL, PostgreSQL, ... backends work, as long as we can verify that they stick to the contract specified by the interface.
While this decoupling seems weird, and artificial at first, it makes our code testable. And the testsuite will be more efficient. Let's do some back of the envelope computation what we'd have to test if we keep following our current integration testing that we label “unit” testing. Say we have 10 (yes, this number is way bigger in real life) parts of the code that rely on the LoadBalancer. And we have 3 Load Balancers. And 6 Database backends.
That'd amount to 10*3*6 = 180 test cases if we want decent code coverage.
If we unit test using the contract provided by the Database interface etc, we get 10 + 3 + 6 = 19 test cases.
With only 10% of the tests we achieved the same code coverage.
And for critical Database backends (say MySQL as we're relying on that for production), we can add integration tests where appropriate.
Also: once I have a mock object, how do I inject it into the load balancer/LBFactory, so wfGetLB( 'foo' )->getConnection( DB_SLAVE, null, 'foo' ) will return the correct mock object (one one for wiki 'foo')?
For that part of the code, I'd go for setter based injection for now.
Looking at the relevant code, it seems that there is already a LBFactory::setInstance. You can use this setter from within PHPUnit's test(case) setup to inject a mock load balancer (that yields mock database connections).
Kind regards, Christian
wikitech-l@lists.wikimedia.org