Design Pattern Evangelist Blog

Smart pointers about software design

Be On Your Best Behavior

Test-Driven Development - How to create tests; Behavior-Driven Development - What tests to create


Introduction

Previous blogs have addressed the Attributes of Effective Unit Tests, the Basic Elements of Automated Unit Tests, including Test Doubles, the process for Writing Unit Tests as well as a couple stories with Suril and Yuri.

I haven’t spent much time writing about how to determine what to test. This blog will focus upon that task with Behavior-Driven Development (BDD).

How Do We Know Our Code Is Correct

As mentioned in the Formal Proofs section of the first blog in this testing series, I was taught formal algorithm proving techniques in college. But regardless of how confident we felt in our reasoning, a proof never seemed sufficient. We still needed to see the code run.

Traditional computer science problems, such as data structures and algorithms, are well defined, unambiguous, limited scoped, and unchanging problems. Their solutions lend themselves to formal proofs.

Testing can show the presence of bugs, but not their absence!Edsger W. Dijkstra

Bob Martin provided context to Dijkstra’s quote in his Clean Architecture book in Chapter 4 – Structured Programming. Dijkstra believed in formal proofs. His quote is a bit of a jab at the testing community.

Dijkstra’s statement is not incorrect. Testing never shows the absence of bugs, but formal proofs don’t tend to work for software engineering problems, which tend to be fuzzy, ambiguous, large scoped and changing.

What’s a software developer to do? Change his or her mind set. While computer science problems mirror mathematical domains where rigorous proofs confirm correctness, software engineering problems mirror scientific domains from which we gain confidence in our code via the scientific method.

Scientific Method

The Scientific Method doesn’t prove a scientific model. It’s a process where scientific models are subjected to experiments attempting to invalidate them. Models that fail experiments are discarded or modified. Models that pass experiments gain more acceptance in being useful models.

Behavior-Driven Development

Software engineering can adopt a similar strategy. Behavior-Driven Development (BDD) mirrors the scientific method. BDD tests are experiments that focus upon behavior.

BDD tests specify behavior for the Software Under Test (SUT), whose implementation is challenged and confirmed each time the tests are executed. The consistency between the behavior specification defined in the test and the observed behavior implemented in the code is confirmed each time the tests are executed. Bob Martin uses Double-Entry Bookkeeping as an analogy.

BDD tests tend to fall into two broad categories:

As more tests pass, we gain more confidence that the implementation aligns with the specified behavior. We can never achieve 100% confidence, but our confidence level may be good enough to release the software to customers and users.

Perfect is the enemy of good.Voltaire

Contract

SUT behavior is declared as a contract via public methods. A contract defines expectations and obligations. These expectations and obligations apply to both the provider of the SUT and the client that interacts with the SUT.

The contract should be independent of the implementation. A contract specifies behavior. It does not specify how the behavior is to be implemented.

I will dedicate a blog to contracts in the future (TBD).

Behavior-Driven Development Form

BDD test names summarize the behavior specification. The test name is often declared as a method name. They tend to be longer than traditional method names. There are no test name guidelines to the best of my knowledge. Clarity and consistency are what’s important.

Here are a few examples:

Non-BDD test names might be:

I jumped the gun a bit in a previous blog where I described the structure of BDD Test Elements, even if I didn’t mention BDD by name. BDD tests are written in the Given-When-Then structure:

Assert confirms behavior within the SUT itself. It should be implementation independent.

Verify confirms interaction with other software elements. It could be implementation dependent.

Tests Specify Behavior, Not Implementation

BDD tests specify behavior, not the SUT implementation. This can be challenging, since behavior derives from the implementation, and implementation is based upon behavior.

When I was first learning Test-Driven Development (TDD), I realized I had created a test where the Then portion only verified that a Spy Test Double had recorded that three methods in the SUT had been called. At first, I was pleased with the code coverage provided by the test, until I realized that the test verified no behavior. It only verified that the SUT method called three other methods in the class. If I were to refactor the code affecting these methods, then the test would fail even if the behavior had not changed. I reconsidered the behaviors in the SUT, and created a new test that asserted the SUT behaviors rather than verifying the implementation.

Since BDD tests are based upon behavior, we should be able to refactor the implementation without breaking the tests. Refactoring is changing the implementation without changing the behavior. Refactoring is used to make the code cleaner, more modular, more flexible, etc. without breaking existing behaviors.

Tests that verify implementation, such as my three method Spy verification described above, are brittle. Refactoring the code may break the tests even if no functional behavior has been changed. Some developers abandon automated testing, when test maintenance requires more effort than implementation maintenance. These developers have probably written brittle implementation specific tests even if unknowingly or unintentionally.

Test Doubles leak implementation details into tests. Tests with more Test Doubles will be more brittle than tests with fewer Test Doubles. More Test Doubles correlates with higher complexity in the implementation. If practicing TDD/BDD, then the number of Test Doubles should not grow too large. Developers, who practice TDD/BDD and create complex tests and implementations, are taking extraordinary efforts to make their lives more miserable.

When is Testing Enough, Enough?

How do software developers know when they have created enough tests to reach sufficient confidence in their code?

When they can no longer create any failing tests.

This is where domain knowledge and creativity come into play.

Domain Knowledge

At the start of my career in the 1980s, our system engineers would publish documents describing the system. These tended to be long sometimes meandering descriptions of the system.

The engineers would schedule a document review with the developers and testers. We’d review the document a paragraph at a time. If a sentence or paragraph sounded functional rather than just providing additional context, then we’d mark it with a yellow highlighter pen.

The documents were often vague, ambiguous or even missing important details. Heaven help us if we let anything slip through, because after document signoff, the engineers disappeared to their next project or next release, and developers and testers were left holding the bag.

Things improved over the years with enumerated requirements and use cases, but a lot was still lost in translation when it came down to describing the behavior of the system.

Bob Martin tells a 3-4 minute story on the Clean Coders Podcast starting at around minute 36. It’s worth a listen, but here’s a quick summary. Bob’s team was paid when features were delivered. Feature completeness was defined by Acceptance Criteria/Tests provided by the customer. When the tests passed, the feature was delivered, and Bob’s team was paid. If the customer was not satisfied with the feature, then the responsibility was theirs, since they defined the test. It’s difficult for a customer to claim that the test they provided didn’t accurately represent what they had in mind. Bob’s story describes Consumer-Driven Contract Testing (CDC Testing), yet a topic for another blog (TBD).

Most behaviors should trace back to Domain Experts, whether external customers or internal product managers/owners, business analysts, etc. One source of behavior could be User Stories, which should contain Given-When-Then Acceptance Criteria, from which Given-When-Then Acceptance Tests can be derived.

If domain experts don’t follow through with Acceptance Criteria, then discuss user stories and feature behaviors with the domain experts. When a new or updated behavior arises in these conversations, consider how the hypothetical scenarios can be documented via an Acceptance Criteria, from which Acceptance Tests can be derived.

Functional Acceptance Tests should be domain specific enough that domain experts, product managers/owners, and other non-tech staff should be able to comprehend the specified behavior. They may not understand the test implementation details, but they should be able to confirm that the test specifications match their intentions.

Some behaviors may be non-functional, such as performance or security. Some behaviors may be scoped to individual classes and methods that are the building blocks of the product. These tend to be the responsibility of the architects and developers and probably not of great interest to the domain experts.

Equivalence Partitioning

We can’t write tests for every possible input value. We can organize all possible values into similar categories using Equivalence Partitioning. Values in the same partition should exhibit the same behaviors. Rather than attempting to test every possible value in a partition, a few values or even one value in that partition should be sufficient for enough confidence for all values in the partition. Each partition category will probably have its own test.

As an example, if I were testing a square root function, I’d consider the following partitions:

Each domain has its own domain specific partitions.

Consider traditional proofs in Computer Science. They prove that the data structure or algorithm is correct. They do not prove that the implementation is correct.

Beware of bugs in the above code; I have only proved it correct, not tried it.Donald Knuth

While partitioning is not a proof, it mimics some aspects of a proof. Partitioning is like cases in proofs.

Well-designed automated tests, which cover all the cases via partitioning, do something that Dijkstra’s formal proofs can never do. Automated tests challenge the behavior as represented in the implementation upon each execution, which should happen frequently if practicing TDD and as a whole if using CI/CD pipelines.

In all fairness to Dijkstra, I suspect his testing quote was during the age of manual testing, which tended to test the system as a whole. Manual testing is expensive, time consuming, inconsistently executed and prone to human error.

Current testing technology allows us to automate tests that challenge the code at almost any granular level in almost any scenario repeatedly.

Degenerate, Edge and Boundary Tests

The degenerate case is easy. I just return an empty list. For example, let’s say that I am writing a function to square a list of integers. E.g. [1 2 3 4] -> [1 4 9 16]. The Z test is simple. []->[]. The O test is easy to write: [2]->[4].Bob Martin on X/Twitter.

The degenerate case is simple. Define tests for the most basic behaviors first. I did this with Yuri with the first test, which evaluated “3” and returned 3, which is the simplest arithmetic expression to be evaluated. Then we added more complex arithmetic operation behavior test scenarios while keeping all tests passing.

Include edge and boundary tests. Off-by-one errors are all too common. Define tests that confirm behavior at the edges and on all sides of a boundary.

There are several reasons to start with the degenerate and edge case boundary behaviors and work toward specifying more complex behaviors:

At first, the implementation will look like a patchy quilt that handles the edge and boundary behaviors separately. But recall that refactoring is part of the TDD process. Commonalities will emerge. Refactor the code to consolidate those commonalities.

As the tests get more specific, the code gets more generic.Bob Martin in a blog.

Test-Driven Development and Behavior-Driven Development

How do TDD and BDD relate to one another? Many consider TDD for unit testing and BDD for higher level integration or acceptance testing. Unit testing confirms nut-and-bolts; whereas integration testing confirms that the nut and bolt actually fit together. Lower level testing confirms software components and higher level testing confirms the integration of those software components as they compose pieces of the product.

I don’t make a binary distinction. It’s more continuous to me. I feel that the behavior concepts of BDD help us in defining unit tests as well. The major distinction I make between unit or integration/acceptance testing is the scope and boundary of the SUT. Unit tests tend to have smaller scoped SUTs. Integration/Acceptance tests tend to have larger scoped SUTs. Regardless of the SUT scope, we still want tests that confirm the behaviors bounded by the scope of the SUTs.

Traditional TDD and BDD frameworks are a bit different. TDD tends to use xUnit, with JUnit being the Java version. BDD tends to use Cucumber, FitNesse and others. Use what works best for your needs.

TDD and BDD complement one another like chocolate and peanut butter in Reese’s candy. While TDD and BDD do not depend upon one another, I feel each enhances the other:

Summary

Test-Driven Development describes how to create a test. Behavior-Driven Development describes what test to create.

Behavior-Driven Development helps us know when we’ve created enough tests to achieve confidence in our code.

Both practices complement one another.

References

Comments

Previous: Yuri, the Programming Assignment and Me

Next: Losing Your Job Stinks

Home: Design Pattern Evangelist Blog