Ben Northrop

  Decisions and software development
Essays   |   Popular   |   Cruft4J   |   RSS    

7 Strategies for Unit Testing DAOs and other Database Code

(11 comments)
A light-weight process for inspecting code
  March 21st 2008

I don't care what all the tutorials say, unit testing in the real world is tough. Well, let me rephrase that. Creating a unit test is easy. Creating a repeatably-passing unit test is not.

The problem is, just about all enterprise applications are dependent on an external database, and while a database itself isn't anathema to unit testing, the data inside it is. The reason is that, unavoidably, unit tests of an enterprise application must refer to specific data in order to test functionality, and this specific data is susceptible to change. What typically ends up happening is that earnest developers create unit tests that work at time t, and then at t+1 the underlying data has changed and subsequently a seemingly random assortment of unit tests fail - the ones that depended on the exact data from time t. Ouch. The inevitable fall-out is that developers must then either (a) investigate the failures and fix the unit tests, (b) delete the failing tests, or (c) live with something less than 100% passing unit tests. Not a good choice in the lot.

In my experiences and research, I've found about 7 different strategies for handling data in unit tests, and each one has different costs and constraints associated.

1) Mocks and Stubs

2) Development Database

3) Unit Test Database

4) Local Database

5) In-memory Database

6) Data-independent Tests

7) Non-durable Tests

And, yes, all but solution 1 would technically be considered integration tests, not unit tests. I will try to name them appropriately in this article, but if I slip up, you know what I mean.

Mock and Stubs

Most TDD purists would probably argue for the importance of completely isolating the *unit* to be tested (or "system under test"), and to achieve this, any class or resource that the unit depends on must be mocked, or stubbed, or otherwise faked. For unit testing DAOs, this of course means the database itself. And while mocking or stubbing the database does have its advantages, namely performance and isolation, as mentioned, it comes at a significant cost to complexity and maintainability - writing mock JDBC layers and fake data to be processed by DAOs is not a trivial task.

Even if you did go ahead and create this testing infrastructure, the real meat of the DAO (the SQL, HQL, or mapping) still wouldn't even be exercised by the test, since it's hitting the fake layer and not the real database. In the end, if a unit test of a DAO won't tell you that your SQL/HQL isn't valid, then, really, what good is it?

Essentially, mocks and stubs are a tried and true solution for unit testing in general, but with regards to a database dependence,they don't seem to be a workable alternative.

Development Database

This is the path of least resistence. Rather than spend hours mocking out database layers and creating fake data, why not just hit an actual database, and exercise the real queries, mapping files, etc? And further, since the development database (i.e. the central database the team uses for testing code in-development) already exists and has realistic data, why not just use that? Good plan, right? Well, for creating unit test that work *now*, yes, this is a great, low-cost solution. For creating repeatable unit tests, not-so-much.

The problem, of course, is that development databases are used by everyone, and so data is very volatile (proportional to the number of developers/users and rate of code change). You could attempt to control the volatility by using data population/clean-up scripts (e.g. DbUnit) that are run on the setUp() and tearDown() methods of your unit tests, but this solution too has cracks. First, if multiple developers are running unit tests at the same time, their tests or population scripts could cause conflicts - i.e. my tests delete data that Joe in the next cubicle expects. Second, in larger teams, development environments are often used by other stakeholders (e.g. managers, QA, etc.) for functional tests or otherwise, and so at any given time it's quite possible that data is being manipulated. Essentially, if a test fails, it could be because something is legitimately broken, or it could just mean that some user of the database unknowningly deleted the row your unit test was depending on. You just don't know.

Because of the lack of isolation in a development database, this is not always a viable solution for creating repeatable, data-dependent unit tests.

Unit Test Database

A separate, central database created and managed specifically for the purpose of running unit tests would provide a degree of isolation more for running unit tests than a development database, and so therefore increases the chances that tests would run successfully at future points in time. Again, by using data population/clean-up scripts prior to and after running unit tests, the data conditions that the tests depend on could be assured. Further, by rolling back transactions at the completion of a test and thereby not modifying the data, you can feel confident that your unit tests are not stepping on the toes of other unit tests running at the same time.

There are problems with this solution, however. First, if unit tests have data that is specifically inserted and deleted for *each* test (i.e. in the setUp and tearDown methods of the test) rather than one central set of data for all tests, then multiple unit tests run at the same time could cause conflicts. For instance, if test1 tests that a "findBy" method returns 5 records, but test 2 inserts a new record in its setUp() that gets picked up by the "findBy", then test 1 would fail. The solution, of course, is to use one central data load script for all unit tests, which typically isn't too onerous.

The second problem is that in most enterprise environments, the database isn't owned by the application developers. Creating and maintaining a separate database just for unit testing means convincing (pleading with, cajoling, etc.) the DBAs...and in my experience, unless there's a very strong advocate of unit testing in the management ranks, it just ain't gonna happen.

The bottom line is that this is a good middle-of-the road solution - it has the advantage of being easy to manage (from the application developer's perspective), but still can suffer from data volatility problems which can reduce the repeatability of tests.

Local Database

Maintaining a separate database instance on each developer's local machine is the most optimal solution in terms of ensuring data durability, but the most costly in terms of investment in infrastructure/configuration and the most unrealistic in typical enterprise environments. On the up-side, with a local database, a developer can be extremely confident that the data that his unit test depends on will not change. Further, if one data-population script is checked in to source control and used by all developers, then developers can be sure that data on one developer's machine matches data on another - and one step further, a unit test that runs on one machine should run on another. Most excellent.

What's necessary from the application developers perspective are a few things. First, of course, each developer needs a instance of the database on their machine - which of course entails that each developer take some time to set up and adminster the database, but also could present some problems with licenses depending on your DBMS. Second, a DDL script that populates the structure of the database and DbUnit or SQL scripts that populate the base data. And third, these scripts need to be plugged in to some Ant targets that execute the creation, population, and removal of the database. These are all very achievable tasks.

Unfortunately, if it would be tough to convince a DBA of the "Unit Test Database" option, it'll be down-right impossible to convince him of generating and maintaining a DDL for creating the structure of your local database for the purpose of unit testing. Again, in my experience, DBAs are fairly protective of "their turf", and won't jump at the chance to change their processes and approaches to suit your "touchy-feely", agile practices.

Additionally, having each developer maintain their own instance of the database can provoke problems of data synchronicity and general maintenance chaos. For example, when your unit CustomerDAOTest runs but mine does not, I'm forced to wonder, "do you have different data than me?", "did I configure my database correctly?", "do I need to check out the latest DDL or DbUnit scripts and re-build my database"?

All things being equal, if the DBAs are receptive and developers are competent enough to handle the additional complexity, this is the most optimal approach, in my opinion.

In-Memory Database

If you have the fortune of working with an OR mapping framework, using an in-memory database is a very attractive option. Unfortunately, in many enterprise environments, SQL, stored procedures, triggers, and the like aren't going anywhere.

Data Independent Tests

If you find yourself in an environment where none of the approaches above work, but you still dream of a day with repeatable unit tests, then a very sub-optimal solution is to connect to the development database but merely lower the bar for your unit tests. For instance, instead of asserting that a "findBy" method returns exactly 5 records, just assert that the query does not bomb. Though this does very little to verify whether the actual functionality of the DAO or class still works, it at least informs that the mapping is correct - i.e. the database hasn't changed underneath your feet. In some environments, this alone provides enough value to write unit tests.

Again, although this severely limits the coverage and power of the test, it still allows for the ability for unit tests to be assembled into suites, run in a nightly build, and pass with a 100% success rate independent of whether the data has changed. That's something, right?

Non-Durable Tests

In some camps, I imagine it'd be blasphemous to suggest writing unit tests that are non-durable (i.e. may break down-the-road if data changes), but it is an option. Unit tests serve many purposes, one of which is to give confidence that a change (refactoring or enhancement) made to the system has not broken old functionality. To realize this benefit, indeed, it's probably necessary to have an assemblage of unit tests that run with 100% success - run the tests, make the change, run the tests again, you're done.

There are, however, other, less-widely touted benefits of unit tests, and these benefits can be achieved even without a 100% success rate. Firstly, unit tests allow developers to test isolated pieces of a system without having to go through the interface. Without a unit test, a developer who wants to test a query in a DAO literally must write the query, the DAO, and then every piece on top of it all the way up to the interface, compile and deploy the application, navigate to the part of the system that executes the query, and finally run the function. Whew. A unit test, checked into souce control, even if it doesn't pass, at least gives a maintenance developer a driver of that piece of code so he can execute it from his IDE. This alone is helpful.

Secondly, unit tests, regardless of whether they pass, can also serve as semi-decent documentation of code - describing what parameters to pass in, how to set up pre-conditions, etc.

This is an extremely unsatisfying solution, but it could be argued that there is some value in it.

Conclusion

From my experiences and research, these seem like the prevalent approaches for unit testing database dependent code. I'm quite sure I'm missing options, or missing specific costs, benefits, or nuances of the options I have described. Let me know what you think!




I believe that software development is fundamentally about making decisions, and so this is what I write about (mostly). I'm a Distinguished Technical Consultant for Summa and have two degrees from Carnegie Mellon University, most recently one in philosophy (thesis here). I live in Pittsburgh, PA with my wife, 3 energetic boys, and dog. Subscribe here or write me at ben at summa-tech dot com.



Got a Comment?

Name:
Website:
Are you human:
Comment:

Comments (11)



Scott Bale March 24, 2008
Nice breakdown. Imho, I favor method 1 (mocks) for just about everything except DAO's, and something like DBUnit setup()/tearDown() for DAO's. Like you said, you really need to test your DAO's with a real db. But, like you also said, db data is volatile so the option I like best is db setup/teardown. If you're careful, you can design things such that a test can run within a transaction which you rollback at teardown, and never commit. I agree it's not foolproof. I would prefer that a test setup absolutely all data it needs to test with, but this may not be feasible for tests spanning a large number of tables (then again that may be a sign of a DAO with too much responsibility/complexity).
Generally I favor making it as easy as possible for the next guy to run the tests, so I'm not a big fan of each developer having to install a local database. It's hard enough getting your teammates to run tests even if they can be run at the click of a button.

Marco March 24, 2008
I always like your posts. Thank you!
I would like to expose some comments about this topic.

I think "Development Database" is the best and affordable solution.
I don't agree with you about the problems of this solution.
I thinks that is wrong if the test shoud success only if the "Custumer X" pre-exists. Who did insert that customer?
I think that the unit should create all the data necessary to run to success the test in the SetUp() and should be indipendent from pre-existing data.

Hamlet D'Arcy March 24, 2008
The book xUnit Patterns has a pretty good section on Database testing strategies. The "Transaction Rollback Teardown" approach mentioned by Scott Bale is partially documented here: http://xunitpatterns.com/Transaction%20Rollback%20Teardown.html

All the patterns are here: http://xunitpatterns.com/Database%20Patterns.html

These links are to the working copy of the book before final edits, so the book differs slightly.

codecraig March 24, 2008
Interesting blog.

Is your main homepage busted? Doesn't load "pretty" for me.

Rob March 24, 2008
Personally I think Mocking (especially when it comes to DAO's) is a big waste of time (and it does take a long time to set up) which does little more than give developers a false sense of security.

Having discounted Mocking, I think a script that can be run in a local developers DB or an Integration DB to clear and then setup the DB for running REAL tests is the way to go

Ben Northrop March 25, 2008
Thanks for the comments!

Scott: I agree it's really tough to convince other developers to run 1-second unit tests, much less installing/maintaining a local db and running unit tests against it. I have seen it work though, with a group of really motivated developers...and it was great. Definitely an investment...but having thousands of unit tests passing at 100% was worth it. Made maintenance, extension so much easier.

Marco: I probably was a bit harsh on the Development Database option, because it is the best solution in most contexts. It's also the solution I've been burned by the most - trying to modify a piece of code, find the unit test, run it, and find it breaks because some data wasn't there. Ugh. The "Customer X" scenario I mentioned was a problem of synchronicity - developer one's setUp() inserts some data, then another developer at that *same* time (through the app interface, or another unit test, etc.) manipulates the data that was just inserted, and then developer one's unit test fails because the data was messed up. In small environments, not a big deal since the probability of such collisions are low...but in a team of 40+ developers with managers and QA-ers in the mix as well, it's more likely that this data is in flux at any given moment.

Hamlet: Thanks for the link. I haven't read xUnit yet...though seems good. Nice post on Mockito, btw.

Craig: Thanks. I've seriously neglected my home page...have no idea what to put on it. Just cleaned it up a bit though.

Rob: Agreed that mocking/stubbing for DAOs is a waste of time - always surprised when people still propose this. The cost/benefit just isn't there, imho.

Spike May 06, 2008
Nice job describing the pain of writing useful unit tests.

The problem I've found, beyond everything you describe, is testing what happens to the system when you have 5,000 users doing something and things happens that you never see from a unit test.

Not to say unit tests don't have their place. Whatever you choose is better than nothing. Unless it wastes your time for no gain. But besides that, testing without load testing seems to fall short.

Thanks for taking the time to share your perspective and experience.

Arman July 10, 2008
Very nice coverage. I like "Unit Test Database" and "Development Database" options. I have seen another approach that worked just fine for database related unit testing. It takes a little longer but works. Essentially you create your Database in the beginning of your tests in a new transaction, run all of the table, view, stored procedure etc. creation scripts from your code, insert the necessary records to test against (or do it on a single test level), run the tests, roll back the transaction. I agree with the author, there's not perfect solution to this problem. Every approach has one or more annoying side effect.

Arman July 10, 2008
Very nice coverage. I like "Unit Test Database" and "Development Database" options. I have seen another approach that worked just fine for database related unit testing. It takes a little longer but works. Essentially you create your Database in the beginning of your tests in a new transaction, run all of the table, view, stored procedure etc. creation scripts from your code, insert the necessary records to test against (or do it on a single test level), run the tests, roll back the transaction. I agree with the author, there's not perfect solution to this problem. Every approach has one or more annoying side effect.

Ashok November 23, 2010
Very nice views on the trade-offs of each.

Thanks for

vlad September 05, 2012
Warning: mysql_query() [function.mysql-query]: Access denied for user 'bennor2'@'localhost' (using password: NO) in /home/bennor2/public_html/includes/bottom.php on line 106

Warning: mysql_query() [function.mysql-query]: A link to the server could not be established in /home/bennor2/public_html/includes/bottom.php on line 106

Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/bennor2/public_html/includes/bottom.php on line 107
Comments ()