Design Pattern Evangelist Blog

Smart pointers about software design

Approval Testing - A Test Strategy for those who are reluctant to try Test-Driven Development

I’m the Design Pattern Evangelist, and I APPROVE this message


The Flintstones

Introduction

Some developers, often seniors, just cannot make the adjustment to Test-Driven Development (TDD). They have to write the code and see it work first. Maybe there’s another way to provide a testing strategy for those dinosaurs who need to see the working code first by practicing Approval Testing.

Unit Testing declares behavior specifications.

Characterization Testing reveals existing behavior.

Approval Testing approves emerging behavior as it’s being implemented.

All three types of tests look similar, since they feature the Given/When/Then format. The distinction among the three practices resides in how the Then portion of each test is created. This is why I specifically used the term Testing, a process, rather than Test, an artifact of the testing process.

Approval Testing Venn Diagram

These three processes mainly differ as follows:

I’ve written blog entries about Unit Testing and Characterization Testing previously. Approval Testing can be thought of as the intersection of Unit and Characterization Testing in that desired behavior is approved in its existing form as it emerges. I’ll focus upon Approval Testing for the rest of this blog entry.

…, but I know it when I see it

I can’t define pornography, but I know it when I see it. — Paraphrasing United States Supreme Court Justice Potter Stewart

Much like Justice Stewart, many developers are not comfortable defining behavior specifications via the TDD process, but they know the desired behavior when they see it.

Approval Testing allows developers to write their code, validate it via their own observation and then document that validated behavior observation in a Given/When/Then test.

If a subsequent implementation update causes an Approval Test to fail, then the developer must observe the new failing behavior and decide whether it violates the previous approved behavior or whether it’s demonstrating new desired behavior that’s emerging. If the new emerging behavior is correct, then a new behavior assertion replaces the previous approved one.

Yabba Dabba Doo

I featured Characterization Testing as a technique to add tests to existing legacy code. In my brief description, I described setting up an assert declaring that an obviously invalid String, such as Fred Flintstone, would be the return value from a method. Then when the legacy code returned its actual String value, we could replace Fred Flintstone with the actual value.

Let’s tweak that scenario a bit. Let’s assume that you’re working for Hanna-Barbera, the cartoon production company, and you’ve been assigned to the YabbaDabbaDoo feature.

Given that you’re still prototyping, you may not have quite enough domain knowledge to feel confident enough to specify behavior using TDD. However, when you observe emerging behavior while prototyping, then you can document that behavior in a test.

You putter around a bit and observe that one of the methods returns Fred Flintstone, which looks like it’s doing what you’d expect it to do for that scenario.

STOP!

Don’t proceed with more code until you’ve documented this observation in a test. Create a Given/When/Then test that asserts that Fred Flintstone is returned. You may need to create the test from scratch, or maybe you’ve already been working on the Given/When sections of a test, and you only need to complete the it with the Then section that asserts Fred Flintstone.

You can then proceed with additional development making sure to always stop and document what we’re observing via tests. You should also refactor as you proceed to keep the code clean.

As you prototype with Approval Testing, you may become more familiar with the domain. You may become familiar enough to move from Approval Testing to TDD.

Distinction between Approval and Characterization Testing

When I first saw presentations about Approval Testing, I thought the presenters were describing the Characterization Testing process by a different name. Using two terms for the same testing process seems to be a common practice based upon blog entries and videos by others. I think many view Approval and Characterization Tests as two names for the same process. I initially thought this too myself.

After having thought about Approval and Characterization Tests over the years, I feel that the two processes, while very similar, are different in a subtle way.

Existing behaviors are recorded in Characterization Tests as-is. They reveal and document existing behavior, usually in legacy code, whether it’s right or wrong. Any questionable behavior should be codified in a Characterization Test as-is and flagged by adding SHOULD_IT in its name as described in Hey. This doesn’t look right.

Observable behaviors are recorded in Approval Tests only when the developer has approved the observed behaviors as desired as they emerge from the maturing code.

Both practices are similar procedures, in that they both document behavior emerging from the implementation. The main difference is in how long that behavior has been within the code. For Characterization Testing the behavior has been there for days, months or years. For Approval Testing the behavior has been there for only seconds or minutes.

Find Gaps with Mutation Testing

Approval Testing is reactionary testing. It doesn’t drive the implementation. The implementation drives it. Therefore, if practicing Approval Testing, it may also be a good idea to include Mutation Testing to help identify any behavior that the Approval Tests may have missed.

Some Behaviors Require Observation

Some behaviors are difficult to specify. They must first be observed, and then they can be codified in an automated test. Graphical User Interfaces (GUIs) would be in this category.

GUIs are notoriously difficult to test. Using Approval Testing, the developer would execute the code and visually confirm the GUI and adjust the code until the GUI looks correct. It can be approved in an Approval Test, but how do we test something visual?

The Approval Test doesn’t approve the GUI directly. It approves the content being rendered to produce the GUI. For example, if the GUI’s form factor is a web browser, the content that’s rendered by the browser could be the ASCII that’s in an HTML file. The ASCII content HTML file could be validated in one large String comparison assert.

Yabba Dabba Doo Once More

Fred Flintstone

Let’s return to our Hanna-Barbera project. Given that the customer’s domain is cartoons, they are going to want to feature cartoon images in their GUI. So rather than just returning Fred Flintstone’s name, the customer will want to see his image.

You can easily launch the GUI and verify that you’re seeing an image of Fred:

This isn’t easily automated as a test. But we can automate the ASCII that will render Fred’s image via the GUI.

For example, Fred’s rendering to the right is defined as padded on the right and being 25% of the width of the window. Here’s the HTML code that renders it. You could create a test that confirms the generated HTML for how Fred should be rendered matches the following:

<img src="https://live.staticflickr.com/3145/2970400508_dbf3ef8861_b.jpg" alt="Fred Flintstone"
    title="Image Source: https://www.flickr.com/photos/andertoons-cartoons/2970400508"
        width = "25%" align="right" style="padding-right: 20px;">

Humble Object Teaser

Testing the source of the GUI rather than the GUI itself is one example of the Humble Object Pattern. There are more. I will describe this soon in an upcoming blog (TBD).

Approval Testing via String Comparisons

While not a requirement for Approval Testing, Approval Tests often have one assertion, which may be based upon asserting the toString() result of a complex object against an expected value. For example, the GUI assert could be a comparison of the entire HTML file ASCII content as one long String as described previously.

I use this technique fairly often I’m still practicing what I’d consider TDD, but it’s probably closer to Approval Testing. I start with the test first and populate all three Given/When/Then sections. However, parts of the Then section are still left undefined. Here’s an example from one of my Advent of Code tests from 2024 Day 14 - Warehouse Woes, which I previously described in A House Divided - Advent of Code.

This is one of my tests for Part 1:

public moveRobotToLeftAlsoMovesBoxes() {
    // Given
    Warehouse warehouse = new Warehouse();
    warehouse.add(new Wall(new Position(0, 0)));
    warehouse.add(new Box(new Position(2, 0)));
    warehouse.add(new Box(new Position(3, 0)));
    warehouse.add(new Robot(new Position(4,0)));

    // When
    warehouse.move(MoveDirection.LEFT);

    // Then
    assertEquals("TBD", warehouse.toString());
}

Warehouse.toString() returns a String value of its configuration. When I ran the test, it failed, stating that TBD was not the actual value. The actual value was Robot:(3,0), {(2,0)=Box:(2,0), (1,0)=Box:(1,0)}, {(0,0)=Wall:(0,0)}. I visually confirmed that that was the expected value, and I updated the test as follows:

public moveRobotToLeftAlsoMovesBoxes() {
    // Given
    Warehouse warehouse = new Warehouse();
    warehouse.add(new Wall(new Position(0, 0)));
    warehouse.add(new Box(new Position(2, 0)));
    warehouse.add(new Box(new Position(3, 0)));
    warehouse.add(new Robot(new Position(4,0)));

    // When
    warehouse.move(MoveDirection.LEFT);

    // Then
    assertEquals("Robot:(3,0), {(2,0)=Box:(2,0), (1,0)=Box:(1,0)}, {(0,0)=Wall:(0,0)}", warehouse.toString());
}

When it came to Part 2, I used a more visual representation for the Warehouse. The return value is the String representation of a List of rows, each of which is a String that identifies the elements on that row with # for a Wall, Bb for a Box and @ for the Robot, which mostly match the graphics used in the description of the problem from Advent of Code.

From what I remember, I used TDD when creating this test by providing what I thought the String representation of the warehouse rendering would be, but I used Approval Testing to compare my expected String against the actual String. That is, when the test failed, I didn’t assume the code was incorrect. I double checked my original expected String with the actual returned String, since I could have easily made a mental typo when typing in the expected value manually.

public moveRobotUpMovesOtherBoxesUp() {
    int widthFactor = 2;
    Warehouse warehouse = new Warehouse(widthFactor);
    warehouse.add(new Wall(new Position(0, 0), widthFactor));
    warehouse.add(new Box(new Position(1, 2), widthFactor));
    warehouse.add(new Box(new Position(0, 3), widthFactor));
    warehouse.add(new Box(new Position(2, 3), widthFactor));
    warehouse.add(new Box(new Position(1, 4), widthFactor));
    warehouse.add(new Robot(new Position(1,5)));
    assertEquals("[##.., ...., .Bb., BbBb, .Bb., .@..]", warehouse.getRendering().toString());

    warehouse.move(MoveDirection.UP);
    assertEquals("[##.., .Bb., BbBb, .Bb., .@.., ....]", warehouse.getRendering().toString());
}

Summary

Approval Testing offers an alternative to traditional Test-Driven Development by allowing developers to validate and document behavior as it emerges. Positioned between Unit and Characterization Testing, this method is particularly useful for developers who prefer to write code first and approve expected outcomes after observation. By approving behavior iteratively, developers can maintain confidence in their code while gradually transitioning to more structured testing approaches.

References

Here are some resources:

Comments

Previous: How do you know if your test code is really testing your code?

Next: DRAFT – Mastering Time in Software Testing - Strategies for Temporal Behavior Verification

Home: Design Pattern Evangelist Blog