What Is Abstraction?
Abstraction is separating the WHAT from the HOW in the design and implementation.
Introduction
I’m a volunteer mentor for computer science students at the university where I acquired my undergraduate degree. I visited campus last month for homecoming weekend. While there, I had some time to meet face-to-face with Scott, one of my mentees. Since I live about 250 miles from the university, most interaction with my mentees is electronic, so this was a rare opportunity to meet with him in person.
Scott and I grabbed a quick bite at an off-campus pizza shop directly across from one of the engineering buildings. I asked him about his summer experiences and how his final year before graduation was shaping up.
Scott knows that I’m a fan of Design Patterns, but he wasn’t sure how the patterns fit into what he’s been studying. With only my words and waving hands for presentation I compared Data Structures to Design Patterns. Data Structures solve well defined fixed problems bounded by a limited scope, whereas Design Patterns define solution approaches to recurring flexible fuzzy problems not necessarily bounded by scope. I confessed that developers may need to experience real-world problems in industry before they appreciate Design Patterns. I used a few analogies such as:
- Solving a problem with Data Structures is like being a cook following a recipe to the letter, whereas approaching a problem with Design Patterns is like being a chef creating a new dish based upon years of experience. NOTE: I first typed tears of experience by accident. Though I changed it, tears may be more accurate than years, especially in the context of software development.
- Learning a programming language’s syntax is like learning the rules to a game, whereas learning software design is like learning the strategy and tactics of how to play the game well.
- The wheel is a mechanical engineering design pattern. All wheels have the same basic design of a disk-shaped object that rotates upon an axis at its center. Each wheel must be designed specifically in the context in which it’s being used.
Then Scott asked, “What is Abstraction?” What are they teaching kids in college these days?
I tried to give him a quick answer, but we were running out of time. I promised him I’d follow up with a more complete answer, and that’s the main purpose of this blog entry.
What vs How
Though I chide academia for not explaining abstraction, the word abstract has appeared in 50% of my previous blog posts, and I haven’t defined it either.
I devoted the Getting the Right Abstraction is Hard blog entry to abstraction over a year ago, and I assumed my readers knew what I meant.
I suspect that definitions for abstraction vary. I included a few Abstraction quotes in Bumper Sticker Computer Science and Software Engineering.
Google’s AI generated this definition for me:
Abstraction is a fundamental concept in software engineering and computer science that involves removing unnecessary details to focus on what’s important. It’s a way to make systems more generic and easier to understand.
Other online references also focus upon removing details. I don’t view abstraction as removing details as much as I view it as moving details. Those details still exist. Abstraction is about the management of those details.
An Abstraction, such as an interface
in Java, declares WHAT behaviors are provided. That is, an interface is a set of methods that declare WHAT those methods are and do without defining HOW those methods will do them. It’s a contract.
The details, which define HOW the interfaces will be implemented, have been moved to the concrete classes, which implement the interfaces.
This is evident in the Strategy Design Pattern, for which an interface may have multiple implementations:
ComputerAidedDesign.render()
candraw()
anyShape
without having to know specificShape
s. It doesn’t know that aShape
could be aCircle
,Triangle
or aRectangle
.Shape
is the interface abstraction that declares what it can do:draw()
.Circle
,Triangle
, andRectangle
are each a concrete class defining how to implementdraw()
for each specificShape
.
The Strategy Design Pattern generally does not indicate how references to Shape
find their way into the private List<Shape> shapes
attribute within ComputerAidedDesign
. I introduced a way to address this in the Dependency Injection blog.
The following diagram enhances the previous diagram with the addition of Dependency Injection and a few more diagram elements:
ComputerAidedDesignConfigurer
is a Dependency Injector. It creates instances ofCircle
,Triangle
andRectangle
and adds them toComputerAidedDesign
via theadd()
method not shown in the design.- The red dashed lines are design boundaries. They do not represent implementation artifacts. I’ve added them to illustrate how the design space is organized in regions. I introduced this concept in Hexagonal Architecture – Why it works. Though introduced in the context of Hexagonal Architecture, these concepts do not depend upon Hexagonal Architecture. These concepts are about dependency and knowledge management. Each pair of connected elements have an implicit knowledge and dependency relationship, which is represented via the arrowheads. When A points to B, then A has knowledge of and depends upon B. B has no knowledge of or dependency upon A.
- The red horizontal dashed line separates the abstract and the concrete elements within the scope of this design space. All lines point upward when crossing this line. All knowledge and dependency flows upwards. The ABSTRACT region does not know that the other regions exist.
ComputerAidedDesign
knows and depends uponShape
, but nothing else. The abstraction in this context is thatrender()
knows how to draw a collection ofShapes
. - The red vertical dashed line separates the configuration from the concrete implementation. All lines point toward the right when crossing this line. All knowledge and dependency flows toward the right.
Circle
,Triangle
andRectangle
implementShape
and define how each specificShape
is drawn. They only know and depend uponShape
. They are not responsible for their instantiation or the context in which they will be instantiated. Elements in the CONCRETE region only know how to implement interfaces within the ABSTRACT region.ComputerAidedDesignConfigurer
creates the instances and assembles them. It has knowledge of the rest of the design, as can be seen graphically via the direction of the lines flowing away from it. However, its knowledge and dependency exists only to the extent that it knows the classes exist, how to instantiate them and how to add those instances toComputerAidedDesign
. It does not access their functional methods. All arrowheads point away from theConfigurer
. Elements in the CONFIGURE region are essentially invisible to the rest of the design.
Abstraction makes this design more maintainable. When new Shape
s are desired, such as Trapezoid
and Rhombus
, no updates are required above the red horizontal dashed line.
New concrete Trapezoid
and Rhombus
classes implement Shape
. ComputerAidedDesignConfigurer
will be updated to know about the existence of the new concrete classes as well.
Ease of maintenance continues in the Concrete region as well. The existing concrete classes are not affected by the addition of the new concrete classes.
One may realize that Rectangle
, Trapezoid
and Rhombus
are four-sided Shape
s. What if more four-sided concrete classes were added, such as Parallelogram
, Square
and Quadrangle
? We might observe common code among the concrete four-sided Shape
classes and decide to refactor them to reduce duplication and consolidate related classes.
If we choose to refactor or redesign the concrete classes below the red horizontal dashed line, we can do so without concern of breaking any behavior that resides in the abstraction above the red horizontal dashed line.
Abstraction Is Not Obvious
Strategy is only one example of abstraction. Functions, procedures and methods with meaningful names that document what the code does are probably the most common forms of abstraction as championed by the Extract Method refactoring technique.
The right abstraction may not be obvious. See: Getting the Right Abstraction is Hard
The ability to abstract wasn’t available in early programming languages. The first programming language I learned was BASIC in the 1970s. Each statement had a line number, and that’s how program flow was directed via GOTO
and GOSUB
statements. There were no subroutine names. Subroutines didn’t have arguments or return types. All variables were global. And from what I recall, variables could be no more than two characters long.
As programming languages advanced, they supported more abstraction. Abstraction is not for the benefit of the computer. Abstraction is for the benefit of the software developer to better convey and understand the intent of the implementation.
Abstraction In the Real World
Real-world abstractions abound around us. Once you see them, they’re easier to see and understand in code too.
Cars
Cars are complex machines, yet most adults can drive unfamiliar cars easily. Rental cars are often different from what we drive at home, but most people can hop into a car and drive it off the rental lot.
Most driving controls and the dashboard are an abstraction of the driver’s intent. The driver uses the steering wheel, gas pedal and brake pedal to tell the car what to do. The under-the-hood mechanics is how the car satisfies the driver’s intent.
The dashboard displays pertinent information to the driver, such as speed, miles, remaining fuel, warning lights, etc. This is mostly an abstraction as well.
There’s one dashboard instrument that I’ve never understood why it’s presented to the driver – the tachometer. I’m not a car enthusiast, so maybe there’s a reason to display the RPMs of the engine that I don’t know. My last three cars had tachometers, and they were automatic transmissions. My first car was standard transmission, and I rarely looked at the tachometer to decide when to shift gears.
The tachometer feels like a Leaky Abstraction to me. It’s telling me design details about the engine that I don’t need to know even if its benign exposure. We want to avoid leaky abstractions in our designs. We don’t want to expose implementation details to our users via the abstraction. Any concept that’s exposed to users will become an operational dependency by a user. If a project leaks an implementation detail via an abstraction, then it’s part of the interface.
Woe be the software project that leaks its database schema. It will become part of its API.
Wikipedia
Each Wikipedia page focuses upon one topic. The page’s title is its abstraction which briefly describes what the page is about. The rest of the page contains topic details.
Most Wikipedia pages contain embedded hypertext links to other Wikipedia pages for additional information. While there’s basically one layer of abstraction for each Wikipedia page, there are many layers of abstraction within the Wikipedia environment. As readers follow the embedded links they dive deeper into details for additional context, but it’s the readers’ choice to do so. Sometimes readers go so far down the Wiki Rabbit Hole that they forget where they started.
What if a Wikipedia editor didn’t want to create a new page and reference it via an embedded link? What if they put the new content in the current page, even if that content strayed from the page’s original topic? And then another editor added more content to that content further straying from the page’s original topic.
If this pattern continues, the page will soon become bloated with off-topic details. The reader may have to read the entire page, but it may not be obvious which content is part of the original topic and which content contains supporting off-topic content.
Fortunately, I think that Wikipedia editors tend to keep most pages focused upon the topic and link to other pages as necessary. I can’t say that all software developers follow the same practice.
In the same way that each Wikipedia page is about a core topic, I feel that each method, function and procedure should have one core responsibility, which is known as the Single Responsibility Principle (SRP).
So much complexity in software comes from trying to make one thing do two things. — Ryan Singer
A method’s name is its abstraction declaring what it does. The implementation contains the details of how it does it. Rather than creating a new method or class and referencing it via the method name, all too often, developers will place the new code in the original method even if it strays from the method’s single responsibility. It’s a shortcut made for convenience.
I’ve encountered methods that are hundreds of lines long, which obviously violate SRP. Each of these methods grew slowly – one shortcut convenience at a time.
Most visual editors can display at most about 50 lines of code on the screen. Any part of a method which I cannot see, I must retain in my head to understand the behavior implemented with a method. If a method is 500 lines long, I can only see at most 10% of the method at any given time. I’m too old to hold the remaining 90% in my mind.
Methods tend to get long when an implementation jams multiple layers of abstraction into one method. Reading the method takes the reader on a nauseating roller coaster ride from high level business concepts to low level infrastructural details and back again. If segregated into separate methods or classes along abstraction boundaries, then readers would only need to examine those lower-level abstractions if necessary.
Real-world concrete abstractions, such as cars, have multiple layers of abstractions as well. A car is comprised of systems, such as the engine, transmission, drive train, braking system, electrical system, etc. These systems have their own components. Eventually the entire car decomposes itself to a set of parts. Real-world abstractions deconstruct themselves to their most basic parts.
Wikipedia is different from cars. Follow the links of a Wikipedia page, and you end up at another Wikipedia page. No matter how many Wikipedia reference links you follow, you still end up at another Wikipedia page that mostly looks like the others. It’s a single topic page with words and links to other pages.
Software is the same. As the reader descends into method implementations, each method will tend to look like other methods. Code that implements high level and low level concepts will tend to look the same. It will consist of if
statements, for
loops, calls to other methods, etc. This similarity is what makes it difficult to separate business concepts from infrastructure details when they are intermixed in the same method.
Layers of abstractions in Wikipedia and software are fractal. Regardless of where you reside in the layers of abstraction, it all tends to look the same.
The fractal nature of software is a double-edged sword. On the positive side, our abstractions are not constrained by the physical world. On the negative side, our abstractions are not constrained by the physical world. We can design our abstractions anyway we desire. We do not have any real-world guardrails to keep our abstractions in check.
Our abstractions, right or wrong, are our responsibility. Get the abstraction right, and code will practically write itself. Get the abstraction wrong, and it will cause you a world of pain.
Cohesive Abstractions
Since this blog is getting long, I’ll postpone Cohesive Abstractions to the next blog entry.
Summary
Abstraction is an important concept in software engineering separating what from how. Abstraction is one of the main concepts that allows us to create a modular design for ease of future maintenance.
However, the right abstraction is not always easy or obvious. Don’t force abstractions too early. It may take a few examples before the abstractions emerge from the domain, design or implementation.
References
- Wikipedia Abstraction In Computer Science
- Wikipedia Abstraction (Computer Science)
- Wikipedia Abstraction Principle (Computer Programming)
- What Are Abstractions in Software Engineering with Examples - The Valuable Dev
- What is Abstraction in Programming – And Why is it Useful? - Tiago Capelo Monteiro
- abstraction definition - Ivy Wigmore
- Abstraction: Not What You Think It Is - Jimmy Koppel
- Software Engineering: It’s All About Abstraction - Kevin Lalumiere
- Abstraction - Cornell University: CS211
- Abstraction, Encapsulation, and Information Hiding - Edward V. Berard
- and for more, Google: Abstraction in software