Test Automation: What and When to Test

Test Automation (TA) was on everyone's minds a few years ago. However, the average automation level was only 16%, according to [Legeard 2017]. DevOps is now the norm for 80% of agile Teams. This means they use a Continuous Integration tool like Jenkins or Gitlab-ci [Bensten 2019]. Test automation is a must-have because it gives Developers fast feedback.

What is Test Automation?

Test automation is rather related to “automated checks” since testing actually is an intellectual challenge [Bach 2014]. Test automation must be completed with manual testing to ensure software quality. This is true regardless of the automation maturity of your organization [Bach 2014]. Testers may become familiar with the scripts if they focus solely on TA. However, they may no longer understand the System Under Test (SUT) [Bradshaw 2019-2].

‍

*James Bach’s vision on Testing vs automated checking [Moustier 2019]*

TA stands for any tooling activity which makes testing more efficient and less boring [Bach 2016]. Examples of these include automating test data generation or creating scripts to "shadow" manual tasks [Bradshaw 2019-2] [Graham 2012]. Automated checking also contributes to facilitating testing and making it less boring.

Automation tasks require specific tools to meet certain needs. Testers should have knowledge of multiple tools. This could lead to the renaming of Testing Automation to "Automation in Testing" [Bradshaw 2019-2].

Test automation, as it's called, is often subject to maintenance issues and unreliability. Automated scripts are not able to cope with unexpected problems such as environment issues, incorrect data, or testing scripts issues since:

tools cannot judge pass or fail situations [Fewster 1999] but the way the script tells how to check
scripts may not fit newer versions of the SUT

Those issues provide strong constraints in what should be automated at the right moment to reach a fairly good return on TA investment.

What to automate?

Basically, since genuine testing is learning something about the SUT, whenever you don’t learn something from a test script, then you could automate it [Bradshaw 2019-2]. But when it comes to dozens of scripts, to prioritize the test scripts to automate, you may simply ask [Graham 2012]:

What Must be automated?
What Should be automated?
What Could be automated?
What Won’t be automated?

and keep in mind that not all tests can or should be automated [ISTQB 2016], especially if the SUT is not designed with automatable testability that could be involved in scripts [Fewster 1999] [ISTQB 2016]. This MoSCoW approach is a simple start that can be refined with criteria such as [Fewster 1999] [Jones 2018]:

the frequency of use of a feature to test
the complexity to automate the test script
the impact on the perceived added value on the tested feature
the rapidity at which bugs found by the test script can be fixed
the rapidity at which the test script can be coded
the frequency of breaks and the volume of bugs found in the tested area

and also:

Testing can be a bottleneck in the delivery flow. Some tests are hard to run manually [Colantonio 2018] [Graham 2012].
Automating test scripts on a prototype may not be profitable. Therefore, it is important to consider the cost versus the lifespan of the feature, as discussed by [Hage Chahine 2023]. Additionally, you should retrieve some estimates of the feature maturity, according to [ISTQB 2016].
the coverage of the test script [Axelrod 2018] should also guide you to define which uncovered parts (features, code, branch and eventually decision coverage) will be protected regarding your context

From those selection criteria, you may also consider those extra tips [Axelrod 2018]:

Consider Test automation on:

stable parts with some plan to replace some underlying components
parts on which there is a need to improve performances
parts that are error prone
“Confirmation Testing” (aka “Defect-Driven Testing”) by replicating the steps that reproduced the bug [ISTQB 2016]

Avoid TA on:

parts planned to be replaced
very stable parts with no plan to touch them soon

As you start TA, your organization may decide first there will be a dedicated Team responsible for automating scripts for several projects. This configuration may be used as a proof of concept to generate some buy-in within the Hierarchy and Teams. This will help spread TA practices. However, this is not a good idea in the long run. It can create a silo within the project development stream, leading to delays and communication mismatches.

Regarding scripts to be automated, you should:

choose the most valuable tests from the most common Customer journey [Hage Chahine 2023][ISTQB 2016]. These tests should have high added value and affect both Customers and the assets of your organization. To do this, involve Stakeholders in agreeing on the added value of tests. This will create buy-in, especially if the TA is just beginning [Bradshaw 2019-1].
plan for later growth to aim just enough feature-based added value coverage, before digging into thorough test coverage [Axelrod 2018], because it may slow your feedback loop and increase maintenance costs too early [Graham 2012] [Fewster 1999]

TA requires some maturity which includes skills such as being able [ISTQB 2016]

to automate - this can be reduced with low-code/no-code automation platforms
to analyze test outcomes - bugs, false-positives and flaky tests must be handled correctly to enable both SUT and TA improvements.
to address non-functional requirement automated testing

These challenges can be facilitated with:

a testable SUT with stable testing environments to have deterministic scripts [Colantonio 2018]
small test scripts - and thus easy to maintain [Axelrod 2018]
an incremental approach [Graham 2012]

This will construct a "steel thread" of rock-solid tests, as described in [Graham 2012]. This thread will be used to gradually create a Customer's journey within the business processes, as detailed in [Axelrod 2018]. It will also enable the testing quadrants during the course of the sprint.

‍

The Agile Testing Quadrants from Brian Marick’s original work [Moustier 2019] — *The Agile Testing Quadrants from* *Brian Marick’s original work* *[Moustier 2019]*

Test Automation Strategy

Remember the etymology of the word “strategy” when creating a TA plan. “Strategy” comes from the Greek words “stratos” (army) and “ageîn” (leading).. This suggests that you should take the benefits of any people involved in delivering the SUT [Graham 2012], and include Developers as well [Segal 2022], notably to avoid silos and improve the ROI. Actually, TA is an activity into which everybody can participate [ISTQB 2016], even people with non coding skills when it comes to analyzing testing reports.

Since testing should happen as soon as possible as per the “Shift Left” testing principle [ISTQB 2018], the SDLC should also be taken into account by TA. Culturally, Testers see TA only from a GUI perspective: for a hammer, the world is made of nails!

Actually, TA offers many possibilities at every level of the automation test pyramid:

at integration level - automated test scripts ensure the components are able to provide a subpart of a whole service. Mocks and stubs offer then the possibility to isolate test scripts and improve repeatability and speed as well
Unit tests should be automated at the code and component level. This is where tests will have the best return on investment, according to [Graham 2012]. This is especially true when considering code, branch and decision coverage.

The Test Pyramid is a well-known concept. It recommends automating a large number of Unit Tests, fewer Integration Tests, and a few End-to-End Tests. Manual testing should be used to complement the automated scripts so that:

all non automated tests can be part of the test campaign
some exploratory tests should also complete all those existing scripted tests

‍

*Original shape of Mike Corn’s automated test pyramid [Moustier 2019]*

‍

The pyramid has changed since Mike Cohn's version. It now comes in different flavors and shapes, depending on the context of your System Under Test (SUT). For example, if you're using frameworks like Microsoft Dynamics 365 or Salesforce, Unit Tests may not be applicable or relevant.

It is important to leverage lower levels as much as possible, but also keep ROI in mind. You may promote a reversed pyramid as long as your analysis is transparent. This will enable a sustainable TA strategy and prevent an untamed "ice cream cone" antipattern. [Craske 2020]

The test pyramid is a heuristic that works most of the time [Crispin 2014] [Bradshaw 2019-2]. Additionally, the Swiss Cheese Model can be used to support some analysis and tweak the shape of the pyramid. In this model, every check from any level of the pyramid is a slice of Swiss cheese with holes (also known as “eyes”) inside. A bug is actually an alignment of holes the final User may experience [Moustier 2019]. Therefore, TA can be seen as spotting the earliest slice to automatically ensure a hole is filled.

‍

*The Swiss Cheese model [Moustier 2019]*

‍

The SUT architecture allows us to cover the UI, Business Model, and database tiers with test setups. This enables us to isolate test cases from configuration changes [Axelrod 2018]

for specific End Users’ profile
from a specific client (say a given version of an Internet browser)
just “under the skin” to ensure the Business Model without any UI considerations
at a server, microservice or component levels - notably with those so-called “unit tests”

Actually, being able to track calls between layers helps a lot to tell which layer test can and should be addressed [Bradshaw 2019-1].

Exhaustive testing is impossible, according to the testing principle from [ISTQB 2018]. Therefore, the Shift Left strategy is not sufficient. Shift Right Testing should be implemented as well.

It is then extremely valuable to be able to monitor the health of the product once it is deployed to End Users [CFTL 2021] notably through

automated scripts to proactively find issues in production when acting as a Customer [Graham 2012], notably through smoke tests or from well known recurring flaws in the product
any observables that would be indicators to the product health just like a blood test report would reveal some underlying disease

Return On Investment of TA

As seen above, “What should be automated?” underlies the ROI question. Once the organization has decided towards TA, it appears there is a vast amount of choice that leads to prioritization and thus the ROI question.

The classical ROI formula is [Kelly 2004]:

Cost of automation / Cost of manual testing

with:

Cost of automation = tools cost + labor costs to create an automated test + costs to maintain the automated tests
The Cost of manual test is mainly the labor cost to run tests

‍

It is not possible to compare automated and manual testing. Automated tests mainly perform regression testing, so they do not provide good testing on their own. However, Test Automation (TA) can be very profitable. According to [Kelly 2004], it can give up to a 900% return after a year. [Graham 2012] suggests that it can break even after just one month.

Actually, TA is mainly profitable to Customers’ business matters [SAFe 2023] because it enables fast delivery. This is quite relevant in iterative development since delivery delays infer cost of delays. Therefore, TA is mostly relevant in terms of value stream [SAFe 2023].

However, to improve profitability within the project, few heuristics can be involved [Graham 2012]:

scripts must have a strong maintainability over the long term
scripts must provide strong confidence and speed over defect discovery
script must comply with the well-known FIRST criteria:

Fast
Isolated
Repeatable
Self-valued
Timely (see below)

This can be summarized with TRIMS [Bradshaw 2019-1]:

Targeted - you need to find the lowest point at which you can mitigate a failure risk
Reliable - non deterministic tests generate analyses that take time
Informative - if a test passes, you should exactly know why it passed and eventually challenging a high level feedback message with further checks deeper in the system
Maintenable
Speed - the tests need to be designed and played fast - the higher the layer, the slower the test

To have some control over these aspects, you may introduce observables on the test scripts asset [Colantonio 2018]

to assess the “Mean Time to Diagnosis” to know how long it takes to debug a failing test script - the longer it takes, the worse the test scripts asset
to count the bugs found by automation - this indicator is culturally highly appreciated!
to assess the “Flaky Rate” by counting false positives and flaky tests notably from a post-run analysis - the higher the rate, the less reliable your test scripts are
to assess the ratio “Automated/Manual” amount of tests to make TA progression obvious to Stakeholders - even if this counting is not relevant in terms of quality [Bach 2016], it is quite efficient from a Stakeholder point of view

The Hierarchy who firstly paid for TA must be informed on how good/bad the automation goes in order to sell the benefits of automation [Graham 2012] to stakeholders. It is even more important at the start when buy-in has been required to budget the initiative. This can be done through reports that notably show passes and fails with green/red bar graphs, so that people will understand the refactoring needs on scripts.

When to automate tests?

Everything starts when the TA need arises. At that moment, the TA tool must be selected regarding [Colantonio 2018] [ISTQB 2016]

technologies involved in the SUT
and available skills within Teams.

Some extra criteria must also be looked at to ensure the tool will support automation beyond the first scripts. The following features [Colantonio 2018] are key:

Tool extensibility
The ease of use of the tool to get started, ideally with training to avoid bad decisions
Reporting and debugging capabilities to communicate and do failure analyses
Capability to recognize all the objects in your application
Integration with other tools like version control, test management tools, and continuous integration tools
Community of active users with a large panel of people that could be hired to create your automated tests

Remember that TA may not imply code, some automation platforms may provide features such as “nocode” technology or natural language like gherkin or Keyword-Driven Testing [ISTQB 2016] approach to enable pure Functional Testers and Business Analysts to automate.

Once the automation platform has been selected, practices are to be involved to lower script maintenance. This means that you will build your own framework above the platform to improve productivity, thus the ROI. Usually, involved patterns are [Colantonio 2018] [Crispin 2014]:

Arrange-Act-Assert [Crispin 2014]
DRY (Don’t Repeat Yourself) Ability to change tests in only one place
Page Object Model - eventually under its “SOLID” version named “Screenplay”
Presenter First - a development design pattern named Model-View-Presenter which enables starting testing from an empty skeleton of a Presenter directly provided from a User Story in ATDD mode
Single purpose - tests must aim only one objective. It’s easier to debug and change if business rules change
Setup and teardown to run the tests repeatedly
Use a DSL (domain-specific language) to make communication about the tests easier
Separate the test (the what) from test execution (the how) - enables to change the underlying automation without affecting the business rules
Automation at every distinct levels (Business features / UI workflow / Technical)
Avoiding database access when possible to increase repeatability and tests speed

To start with, engineers should create test scripts with patterns on the existing SUT without any time constraints. After gaining experience, they can automate User Stories along with the Sprint. The automated test scripts should be able to handle both new and legacy features on the SUT [Axelrod 2018] in ATDD mode [Graham 2012].

Starting test automation before coding the SUT pushes development to be testable [Graham 2012], thus introducing built-in quality practices such as testing entry points, logs, alerting mechanisms and eventually some Poka Yoke design on the production code. When starting a new project, begin with TA to have automated tests ready to run as soon as possible [Graham 2012]. This can help improve the quality of the production code.

With time, your framework will grow and more patterns will be introduced. However, you will face legacy issues. This means you will need to remove some technical debt from test scripts and framework assets. Lean management names this a “5S”. A 5S must be done on a regular basis [Graham 2012] [Crispin 2014]

It should involve the whole Team
Technical Debt must be made visible, for instance at Retrospective time, to get buy-in from Stakeholders since it impedes test speed and effectiveness in maintenance and reliability

In DevOps context, remember the Developers’ Continuous Integration pipeline should run within 10’ [Graham 2012] [Kim 2016]. This should lead to a classification of predefined test sets [Moustier 2019] to be run from several pipelines. For instance, one at Devs’ level for fast feedback and another one that would be run at night or during the weekend for the slower scripts.

Key takeaways

During your test automation, remember:

automated checks do not replace Testers - Computers are complements to humans, not substitutes [Colantonio 2018]
automation is a Team effort [Bradshaw 2019-1] [Bradshaw 2019-2] [Colantonio 2018] [Graham 2012], not something disconnected from the development Team [Axelrod 2018]
automation starts before - or along with - the production code, not at the end
test scripts should be treated as production code [Colantonio 2018]
never underestimate maintenance time in TA [Colantonio 2018] and do some 5S whenever it is possible
test scripts should be as “atomic” as possible instead of large end-to-end test scenarios
having a good tool is not an automation strategy [Graham 2012], it’s only a start
improve test scripts and test framework on a regular basis, ideally at each iteration

You must also:

Avoid assigning unrealistic goals on TA [Colantonio 2018] - even if 60% of people witness TA makes bug chasing easier [Capgemini 2018], it does not find more new defects than manual testing
Avoid thinking TA in “Big Bang” mode [Capgemini 2018]
Avoid relying simply on UI feedbacks, consider deeper tests that would check the actual values deeper in the technical layers [Axelrod 2018]

And consider following the Automation in Testing principles [Bradshaw 2019-2]: