Performance tests

Active testing

Reaction time depending on the load, behavior in case of overload

Active testing

What are performance tests?

Performance Testing (PT) is a set of test practices that ensures Users can use the provided solution without experiencing too much frustration, whatever the mass of simultaneous Users (see also “Application Performance Index” - APDEX). PT is an umbrella term that includes any kind of testing focused on responsiveness of the system or component under different volumes of load [ISTQB 2018-1].

Here are some types of PT [ISO25010] [Molyneaux 2014] [ISTQB 2018-1] [Meier 2007]:

  • Response Time Testing - how fast is the System Under Test (SUT) responding , i.e. the ability of a component or system to respond to user or system inputs within a specified time and under specified conditions [ISTQB 2018-2]. This performance criterion is often used as an SLA 
  • Volumes Testing
  • Resource Utilization: how resources are used when reaching the standard load in order to diagnose where/what to optimize/remove bottlenecks such as CPU, memory/storage/network  usage (esp. when data are charged) or battery consumption on mobile
  • Capacity: how does the service behave when overwhelmed by requests. This test helps to identify the load limit of the system from which it will not be available for all queries - determines how many users and/or transactions a given system will support and still meet the stated performance objectives
  • Throughput: how many transactions can be processed at the same time
  • Endurance Testing: (aka “Soak” or stability test) - how the SUT behaves when exposed to a load on an extended period of time
  • Limits Testing
  • Stress testing: aims to reach the threshold above which the system or part of it would collapse and see how it would handles peak loads about known limits - in terms of  Panarchy, this means “reaching the Ω state and see its impact on linked ecocycles
  • Concurrency Testing: how the system handles situations where specific actions occur simultaneously from different sides
  • Spike Testing: how the system responds to sudden bursts of peak loads and return afterwards to a steady state
  • Scalability Testing: how the system can grow and adapt to an increasing demand without reducing performances until the limits are reached - those limits are then handled through alert mechanisms and improved to respond to availability needs
  • Diagnosis techniques:
  • Smoke testing: PT focused on what has been changed since the latest release
  • Isolation testing: used to locate an identified problem
  • Pipe-clean testing: how fast a business use case reacts without any other activity to obtain a baseline  - this approach is the fundamentals for benchmarking [Meier 2007] and making some comparisons with alternate designs or competitors

The combination of those test techniques enable the system to scale.

As an example, the typical approach to achieve performance tests would be [Molyneaux 2014]:

  • Step 1: Nonfunctional Requirements Capture
  • Step 2: Performance Test Environment Build
  • Step 3: Use-Case Scripting: scenario
  • Step 4: Performance Test Scenario Build
  • Step 5: Performance Test Execution and Analysis
  • Step 6: Post-Test Analysis and Reporting

Step 1: Nonfunctional Requirements Capture

As per the principle “Testing is context dependent” [ISTQB 2018-1], the background of the solution to be delivered is utmostly important. For this reason you should know things such as [Molyneaux 2014]

  • How many users will use the application?
  • Where are those users located?
  • How many people will use your product concurrently?
  • How will they connect to the application?
  • How will your users evolve over time?
  • What will the final network architecture be?
  • What effect will the application have on network capacity?
  • What are the expectations on the product?
  • What should we build to measure this?
  • Which use case should be impacted by PT?
  • What are the performance targets?
  • What is the load model?
  • What are the required testing organizations, skills and tools to support the targets?

Neglecting any of those questions will have a drastic impact on the test set and the product reliability. Actually, Performance is an "Architecturally Significant Requirement” (ASR) [ASR 2021][Chen 2013][Moustier 2019-1]. It is among the biggest catalysts to architecture change [Meier 2007]; this is why PT should arrive as soon as possible in the SDLC [Singh 2021] because as with any ASR, the later you discover them, the greater the cost and effort you will face[Molyneaux 2014].

Step 2: Performance Test Environment Build

Here is a example of classic architecture for PT :

Possible PT Architecture

Regarding the many environments, there are natural differences on the network because LAN behaves differently than WAN; while response time may have greater variations on the WAN, LAN will be faster and thus it is not representative of the genuine user experience. This is why there is a necessity to involve different network models notably by providing Load Injectors from a remote place or simulating delays on some backends [Molyneaux 2014]. A particular care should be taken to the organizational security context that may prevent you from installing some tools and this IT matter should be taken into account at design time.

Regarding test data, keep in mind they should model real life. Therefore, the tested use cases frequency should be modeled to generate the test dataset subsequently. Moreover, the test database should be significantly loaded with data to reflect storage behaviors in production. Then, once the PT scenario is over, the database should be restored to its previous state to ease replays. The enclosed data could then be migrated from production with necessary updates due to the new release, and eventually anonymized to comply with GDPR constraints [Molyneaux 2014].

When it comes to PT, it appears that the closer the environment from production it is, the more significant the metrics would be; unfortunately, it is not always possible for cost reasons and also if third parties are involved. Actually, there are alternatives to exact copies of the production environment [Molyneaux 2014], notably with a subset of the production environment with fewer or smaller and all expected parts and try some extrapolation to imagine the system capacity.

Step 3: Use-Case Scripting

To be significant, PT should be applied to business scenarios. However, since PT implies significant efforts and delays, only business-critical use cases or parts should be regarded with PT. Within those use cases, session data requirements, inputs and checkpoint should then be defined. 

At this step, monitored parts should be separated from non-significant parts of the use case scenario so that first-level analysis of potential problem areas within the use case are provided as soon as they are made available.

The scripts should then be tested for both single and multiple users. Appropriate logs and performance metrics should be generated to enable future analyses.

Step 4: Performance Test Scenario Build

Once scripts are available, the performance testing session must be arranged, notably regarding the product context. For instance, before running each scenario, you may launch a “pipe-clean” test to establish a baseline or go straight to the most critical business case if time is getting short!

From the performance testing scenario, the load model should also be defined so that Load Injectors would stimulate the SUT just like it would be used in production, that includes notably

  • when / how often users stimulate the SUT?
  • what are they doing?
  • how heavy are the data they use?
  • how the injectors should be launched against the SUT? Big bang? Linearly? Exponentially?  In a Gauss-like curve? Are there some steps in the progression?
Few Load Models from Users’ unique queries (random, burst, progressive) - values provide a “unit load”

Whatever your load model, it should help you to reach your SLA objectives [Molyneaux 2014].

A special care should be taken on data sets that can be used only once or time sensitive to enable replay. This may lead you to generate test data [Meier 2007].

Step 5: Performance Test Execution and Analysis

Since this step is crucial, result and time wise, it should go straightforward rather than a bug-fixing exercise [Molyneaux 2014]. This is why you should make sure 

  • the application is ready, at least on the tested use cases
  • the scripts are robusts to run flawlessly until the end of the scenario
  • nothing has been omitted in the configuration that would make the PT session fail
  • ideally reset the database between PT sessions to recover metrics as close as possible from the baseline

Step 6: Post-Test Analysis and Reporting

As per with any test activity, hypotheses made against objectives can be either checked or rejected from collected data. From the results, a root-cause analysis should help you to provide the appropriate performance improvements.

The analysis can be helped with sensors inserted such as JavaScript code snippets that would track calls or even use code profiling tools to isolate performance issues.

Impact on the testing maturity

In a waterfall-like environment, it is crucial to forecast enough [Molyneaux 2014]

  • Lead time to prepare test environment
  • Lead time to provision sufficient load injectors - le tir peut durer plusieurs heures
  • Time to identify and script use cases
  • Time to identify and create enough test data
  • Time to instrument the test environment
  • Time to deal with any problems identified

All this required time inevitably generates a need for a code freeze [Molyneaux 2014] which means the flow must be stopped which is an issue in the agile mindset and generates a “wall of confusion” [Kawaguchi 2020] and a deep understanding of the Theory of constraints  to handle such bottlenecks, unless you do it the agile way...

To get some clue on how PT could happen in an agile configuration, we have to stick to Takeuchi & Nonaka [Nonaka 1986][Moustier 2019-1] :

  • Type A companies, such as NASA, which divide work into well-defined phases and do not move on to the next phase until the previous one has been completed. This is pure gatekeeping. This approach enables a “flawless” PT session from a code freeze.
  • Type B companies, where the phases overlap slightly, based on the observation that it is conceivable, for example, to start preparing the PT session when 80% of the code has been completed.
  • and type C companies where everything is done at the same time, like in a rugby scrum.
Illustration from [Moustier 2019-1]

To reach a type C organization, things must be thought of in an inclusive, progressive and iterative approach. For instance data used for tests which are extracted from production should be migrated, data migration is therefore part of the coding of a US/Feature [Moustier 2019-1].

One of the constraints that prevent PT reaching the type C configuration is when PT is held by teams independent from the development teams. This internal PT service is a case notably dealt by the Agile@Scale model [Sutherland 2019] that leads to a staging approach which leads back to a type B or even A if Managers don’t support a progressive migration of the PT skills held within the PT Service across the organization to facilitate left shifting. This can be reached when PT is thought at three levels [Singh 2021]:

  • Code level
  • New features
  • System level

The “Code level” deals with:

  • Evaluating algorithm complexity and efficiency
  • Evaluating code performance notably thanks to code profiling tools - this anticipates the 6th step
  • Optimize SQL queries [Yevtushenko 2019][Iheagwara 2021] or any other technology-dependent optimization technique that would reduce the time spent on some operations.

PT at “New features level” consists notably in seeking relative performance instead of absolute values. As long as tests are reproducible with statistically identical results [ISTQB 2018-1], there is a baseline from which comparative measures can be deduced. In this situation, “pipe-clean test” are most useful to get a baseline and watch for performance degradation from new code [Gordon 2021] [Singh 2021]

PT at “System level” can be reached by:

  • playing with SCM branch flow to ensure “Performance unit tests “ (isolation tests) on a regular basis [Gordon 2021] [Singh 2021] and performing end-to-end (E2E) PT while merging to a releasing branch
  • Improving performance unit testing by pairing performance testers with developers [Meier 2007]
  • Monitoring resource usage trends and response times [Meier 2007]
  • Collecting data for scalability and capacity planning [Meier 2007]
  • Handling performance with progressive objectives when it is possible [Moustier 2019-1] by separating requirements from intermediate goals which may come from different point of views, mainly from users’, business, technical side, standards, compliance and contracts [Meier 2007]
  • Limiting E2E PT to the most critical business processes as per the Swiss cheese model when trying to Multiply the types of tests on the solution

This can be ease at planning time [Singh 2021] by developing a performance test plan from a US content notably with performance acceptance criteria. Both refinement meetings and Sprint Planning should include the “Agile testing quadrants” [Marick 2003][Bach 2014][Crispin 2021] to address those three PT levels. Sprint Planning should also include spikes/enablers to update little by little the PT environment which actually enables PT. Keep in mind that testing environments will take time to get working smoothly and reflect realistic tests [Meier 2007]. When PT practices are firmly integrated into the organization, they can be part of the DoD [Gordon 2021].

To improve PT, you may also introduce some structured and automated approach where PTs are derived from abstract model representations of the system. This Model-Based Testing (MBT) approach [Armholt 2011] [Da Silveira 2011] would leverage the effort of performance tests design [Abbors 2010].

Whatever the practices, it appears that both SUT and PT scripts must be rock solid to bear hundreds of launches. Like in music many rehearsals will provide good reliability on the test environment, test data,  PT scenarios and metrics capturing. For this, the agile iterative approach on business use cases is of great help because it should provide you many opportunities to refine your PT while doing some performance regression testing at the same time. A complementary approach to reinforce the PT robustness is to introduce Jidoka in your PT scripts so that they would be able to cope with known exceptions and also unexpected behaviours without breaking the running of the scripts.

If the big bang effect can be avoided at deployment stage, Shift Right techniques may be involved through sensors that would equip the system  in order to provide performance measures. When combined with Canary Releasing and small delivery cycles, real users will provide performance measures and issues without disturbing the whole customer's set. You may also voluntarily disturb the product environment locally with artificial load and see how the infrastructure would adapt to load situations. This approach is sometimes named “Exploratory Stress Testing” which consists in submitting a system or part of it to a set of unusual conditions that are unlikely to occur [Meier 2007]; this strategy is more often known as “Chaos Engineering”. This Chaos Engineering stratagem would rely basically on programmable Load Injectors to perform active monitoring.

Passive monitoring can also be used to analyze the traffic from genuine users with the help of Automation of observables (eg. a JS tag on a page that is run from the Visitor’s browser, enabling user/mouse events capturing, at API or IP address level). Those observables tell where to perform PT but also help Andon and record Customers’ journey inside the proposed product and model possible Load Injectors. Both active and passive approaches should be mixed to triangulate results. Passive monitoring is done with APM tools which provide monitoring tools on the shelf.

Agilitest’s standpoint on this practice

Dealing with PT inevitably goes along with tools because hiring hundreds of people and letting them work for 24 hours in a row according to a load model is impossible. Even if Agilitest could be used to implement a home-made tool for performance testing with observables hardcoded in the product to track precisely when actions are triggered while running a business case, it is not a sustainable option for it impacts production code and developers’ workload. This is why a connector has been built between Agilitest and Octoperf to ease PT.

When it comes to mobile PT, there is a simple combination that can be made:

  • Agilitest is able to drive automated test scripts on mobiles, both Android and Apple
  • Tools such as Greenspector are able to measure power consumption on a mobile

Both tools could then be used altogether to measure how safe, energy wise, a mobile application can be.

Moreover, when it comes to PT with end-to-end and complex business processes, the process needs to be robust to bear the many situations inferred by the data set. Agilitest provides simple means to automate those checks with a data-driven approach.

To discover the whole set of practices, click here.

To go further

© Christophe Moustier - 2021