Design Pattern Evangelist Blog

Smart pointers about software design

Basic Elements of Automated Unit Tests

Elevating automated tests to first-class citizen status


Introduction

I’ve posted two blogs about unit testing with the first being about my becoming a Unit Test advocate and the second about the attributes of effective unit tests. Yet I haven’t even hinted as to what a unit test is.

I’ll devote this blog to the fundamental elements of unit tests. These fundamental elements tend to apply to all automated tests.

Caveat – I’m going to focus upon the fundamental elements of unit/automated tests. I’m not going to provide specific unit test examples. Unit test examples for different languages abound on the internet.

We’ve Always Been Testing Our Code

We’ve always tested our code, even when the code has been proven correct.

Beware of bugs in the above code; I have only proved it correct, not tried it. — Donald Knuth

I referenced my testing experiences from my college years in Formal Proofs:

Our testing consisted of running our programs on the assignment data until it produced the desired results. Once we saw it work, we’d grab our results, pack up our gear, head home and hope for a few hours of sleep before the next day of classes began.

In most courses, we never had to look at the code again. Those programs had no future. They were one-off assignments. We’d get a fresh programming assignment and start the process anew all over again.

While most of us got better at testing in our careers, it was probably a refinement of the techniques we picked up in college. We’d inject ad-hoc tests as we were writing the code more frequently, but testing often wasn’t much fun. The tests might be difficult to set up. We might only be able to run them with the entire product possibly in the lab:

Often when we needed to test new behavior, we’d compile the entire product on our desktop machines, copy the executable to a thumb drive, walk it into our QA lab, copy the executable from the thumb drive to a server and then try to observe the new or updated functionality through the user interface.

Our tests were probably not documented. Any insights residing in those tests probably existed only in our heads at the time and then started to fade away soon afterwards. The tests weren’t consistently executed upon subsequent updates, and if so, probably only by us, if we could remember them. The tests tended to be manual and subject to human error. And as the number of tests grew, if we remembered them, their execution wasn’t scalable.

One-off testing is adequate for one-off programs, such as college assignments. But software engineering involves programs that will be in production for months and years, possibly decades. Production code doesn’t have to just work now. It must continue working in the future. One-off testing is inadequate.

I advocate elevating testing to first-class citizen status as part of the software development process. The first step in this journey toward full citizenship will be to automate tests so that they can be executed frequently and consistently to ensure that our programs remain in working order. I will mostly focus upon unit testing here, but I believe that the principles apply to other layers of testing as well.

It’s All Code

A unit test is code that executes within the scope of the Software Under Test (SUT). It confirms that the SUT’s behaviors match what’s expected.

No special tooling is technically required. I’ve been working through the Advent of Code backlog between blog posts, and all my tests are homespun. They are methods that execute and confirm the behavior of the code I’m writing to solve each day’s coding challenge. The tests run automatically when I compile.

The advantage of using homespun techniques is that you’ll understand every piece of your testing environment. There are no hidden mysteries of a separate test harness package.

Homespun testing is fine for learning with small personal projects, such as Advent of Code, but it won’t scale to larger projects. Homespun techniques don’t tend to have many bells-and-whistles. The tests in my Advent of Code homespun environment are executed in order until one fails. None of the subsequent tests are executed after the first failed test.

Here are links to some of my homespun testing, which I featured in previous blog posts. Search for assertEquals in the posts:

Test Frameworks

Tests for more serious projects and production code should leverage the power and support of test frameworks. There are many supporting Unit Test Frameworks. Most major programming languages have a root test framework with xUnit. Each programming language in the xUnit family has notations that match its specific programming language guidelines.

Test frameworks manage the testing environment. We’re still responsible for writing the tests, which are usually methods within a test class. Each test is a method, which is identified as a test via framework annotations, such as @Test. There are often annotations for other test related elements, such as those for common set-up and clean-up methods.

Test frameworks allow us to run the tests with the press of a button in our IDEs. They will execute all tests in scope, whether one test or all and indicate which passed and which failed. Frameworks execute the entire test suite for the project in each CI/CD pipeline build.

Test Elements

Each test focuses upon a specific SUT. The SUT may be limited to an object or set of cohesive objects. Each test confirms one aspect of a method/function in the SUT. Multiple tests are often required to confirm the entire SUT.

Ideally, we’d like to treat the SUT as a black box. The test should not know SUT implementation details. The test should not know the number of objects in the SUT. The test should only know the public method/function, i.e., the API, being invoked and the expected behavior. We should be able to refactor the SUT implementation without causing any unit tests to fail.

Automated tests for the SUT typically consist of three (sometimes four) parts. I will use generic terms here. However, the first three parts are often called Arrange/Act/Assert or Given/When/Then. I’ll present these practices in greater detail in further blogs ((Test-Driven Development](https://jhumelsine.github.io/2024/07/15/tdd.html) and Behavior-Driven Development) with additional context as to how they apply to a test strategy.

Set Up – I.e., Arrange/Given

We want to test the SUT in isolation from the rest of the codebase without the SUT knowing that it’s been isolated. Isolation focuses and limits the scope of SUT being tested. It also limits the complexity of the test. When the test fails, debugging tends to be easier, since the failure often resides within the bounds of the isolated SUT.

But code rarely exists in isolation. Most code has dependencies on other elements including other components of the product, databases, file systems, internet, clocks, etc.

To isolate the SUT from its dependencies, we override these dependencies with Test Doubles. A Test Double emulates dependency behavior with the SUT being none to the wiser. Test Doubles tend to be small snippets of code that emulate a specific dependency behavior that’s only needed within the context of each test.

We replace production dependencies with Test Doubles mostly because Test Doubles tend to be easier to configure and execute faster in the test than the production dependencies. We also have complete control over dependency behaviors via Test Doubles. It may be very difficult to force behavior in a production dependency. For example, how challenging would it be to force a production dependency to throw a specific exception, such as OutOfMemoryError, consistently on demand?

A test only needs Test Doubles if the flow of execution through the SUT interacts with the dependency. Different tests may have different Test Doubles emulating different behaviors depending upon the test. For tests that do not reference dependencies, Test Doubles are not needed.

Test Doubles shed a little light upon the SUT black box. The dependencies they override are SUT design details, which cannot be ignored in the test. The SUT is still a black box, but it’s a box for which the test knows a few details about the exterior of the box. It’s like electronic equipment. We may not know the circuitry in an electronic component, but we do need to know how to plug the stereo components together.

The SUT’s design and implementation usually need to accommodate a Test Double easily so that can override the production dependency in the test. This means that the SUT should not be tightly coupled to its dependencies. I’ll blog (TBD) more on this in the future.

When tests require a lot of Test Double configuration, and especially if it’s complex, then this may be an indication that the SUT may be a good candidate for refactoring or redesign.

Different cohesive Test Doubles may cluster in different sets of tests associated with the same SUT. For example, one set of tests may require Test Doubles A, B, and C; whereas, another set of tests may require Test Doubles D, E, and F. This means that some execution path flows through the SUT require one set of dependencies while other execution path flows require a different set. If these execution path flows are in the same class, then this may be an indication that the class is taking on too many responsibilities. The class may be a good redesign candidate by splitting it into smaller classes based upon these responsibilities.

Execution – I.e., Act/When

A test executes the SUT’s method or function. This is often the smallest part of the test, usually one statement, but it’s the most important part. This is where SUT is invoked and executed. A little or a lot can occur in the SUT before it returns.

Confirmation – I.e., Assert/Then

Without confirmation, we can only know that the SUT doesn’t crash when it is executed. The test confirmation ensures that the SUT is operating as expected. The test must confirm the SUT’s operations without peering into its black box.

There are at least three mechanisms to determine whether the SUT is working as expected without cracking the black box open. A test may use one, two or all three mechanisms:

A test passes when all its assertions and verifications pass; otherwise, the test fails.

If a test requires many assertions and verifications, then this may be an indication that the SUT is taking on too much responsibility, and it may be a good candidate for refactoring or redesign.

Clean Up

Clean up is the fourth element that’s not featured often.

Higher-level automated tests may require clean up. If any shared resources are allocated in the set up, then they should be released in a clean up section.

Unit tests don’t tend to require clean up, since unit test resources tend to be ephemeral. Test Doubles tend to clean up automatically.

Bugs May Linger

Passing tests does not guarantee bug free code.

Testing can show the presence of bugs, but not their absence! — Edsger W. Dijkstra

A test case scenario may be missing. A test may need additional assertions or verifications. I’ll address some of these issues in subsequent blog (How Do We Know Our Code Is Correct?) posts.

Summary

This is the first step toward making tests first-class citizens. These are foundational elements automated tests. Subsequent blogs will add more context with these automated test elements.

References

Comments

Previous: Attributes of Effective Unit Tests

Next: Test Doubles

Home: Design Pattern Evangelist Blog