Thursday, July 28, 2005

Unit Testing Thoughts

In no particular order and with absolutely no structure, some thoughts on unit testing:

  • In order to properly unit test, your design must properly seperate concerns. This means that true business logic should be decoupled from specific APIs, conventions, and protocols... and these from eachother. For example, you shouldn't have the following mixed into business logic: SQL statements, JDBC calls, J2EE API objects, etc.
  • There is a spectrum from "pure" unit testing to complete integration testing, something like:
  1. (Pure) Unit Tests: Have no dependencies and no side effects: test constructors, accessors, comparisons, algorithms. Generally this consists of a single client, the test class, and a single class under test. Developed by developers for developers.
  2. Partial Integration / Integration Boundary Tests? Tests integration between layers, components, or technologies. Where possible, tests should focus on a single boundary, e.g. data access code <-> Database, SOAP Client code <-> SOAP Service, etc. Code that has dependencies on other APIs should use mock objects or some sort of debug-dummy mode. For the debug mode strategy, the property could be "injected" by the runtime invironment with a settor or a special constructor. These tests are also by developers for developers.
  3. Full Integration / System Tests? These test a path through the system that provides verification of some subset of requirements. These tests can be employed by developers, using tools like HTTPUnit. They can also be maintained by testers or properly trained business experts, using functional test automation tools.

Many automated tests, including some unit tests, have side effects on the data that can mess up other tests. For example, tests may rely on a pre-condition state of the database to succeed, and manual testing can be maddening if the data is changing underneath you. Either all tests must mutate data completely independent of any other test or user, or the database must be restored to a consistent state before testing (which still might mess up other testers). Some strategies for managing this situation include:

  • Complete database restore, in an isolated database or schema. E.g. SQL DDL scripts, dbUnit scripts, or programmatically. If the developer's database is a different technology or version from production, this introduces the challenge of maintaining a restore capability for two technologies, as well as potential bugs caused by differences in the production database. To test for problems caused by a different technology in production, the team could use a shared database instance specifically for periodic execution of all unit tests. You could also have multiple levels of restore, e.g. a bare bones one for unit testing, one that adds reference data, and one that puts some example data in for manual testing. I like the idea of using Hibernate (or something similar) to abstract the db technology, using HSQL (or similar) locally, and providing utility methods to add in the required data.
  • Transactions / rollback? This implies that data access code must have mechanisms for using injected transactions, managed by the test code, and/or skipping commits in test mode. Add setAutoCommit, commit, rollback to DAO API?
  • Have each test clean up it's data. This requires that 1) all data created can be identified or tracked by the test, and 2) that the database allows the proper permissions to delete data.

To Be Continued...

Tuesday, July 26, 2005

Qualifying the "Simplest Thing" rule

In XP, and Agile methods, decisions are often made with the criteria of "the simplest thing that could possibly work." I think this applies not only to code construction and detailed design decisions, but to Architecture as well. Strict use of this rule will almost always result in the most prudent approach, but I do think there are a couple of caveats:
  1. The decision should still be reversible, so that you don't paint yourself into a corner for future changes in the requirements.
  2. The simplicity should be weighed carefully against the flexibility and the cost of changing later. If the simplest solution costs 20% less now, but will likely cost 30% more over time, then it may not be the best choice. You have to decide how likely likely is.

I would stress that even with these caveats, one should still be very careful breaking the rule. If you don't have a strong case for a very real risk of ir-reversibility problems or future costs, you should just stick with the simplest solution.

Friday, July 08, 2005

JBoss CMP versus Hibernate

UPDATE 07/26/05
I stuck with Hibernate. Very simply put, they both have many of the same benefits--declarative programming, transactions, etc. But EJB introduces a whole layer of overhead realted to remote access. Unless you are doing a true N-tier, highly distributed application, I don't think that overhead is warranted.

I am starting to look at a persistence technologies for an application I am working on in JBoss 4.0.2. I am currently looking at the (built in) Hibernate in JBoss versus CMP. SO far I am just kicking the tires, gathering info. If you come across this, please feel free to comment.

Harshad Oak has a posting on the topic with comments.

Right now, I can't see any reason why not to use Hibernate: I can use pojos, but still get transactions. I don't see the benefit of creating EJBs for every entity we need to persist.