Canary Releasing

Active testing

New code is deployed in production and activated only for some test clients

Active testing


Canary release (CR) refers to the old days when miners tested new areas to avoid being poisoned by gas pockets. In 1895, John Scott Haldane proposed to use canaries held in cages to possibly poisoned areas [Haldane 1895]; if the birds died, workers avoided these zones. Fortunately, this practice ended in 1987 [Baldwin 2014] though it has been resurrected in the DevOps context.

In DevOps, CR is a mean to reduce the risk of a new release by rolling it out to a small subset of Users before rolling it out to everybody [Sato 2014].

CR, like Dark Launches, is also a way to test your software without blocking the delivery process, even with slow life cycles [Humble 2011]. Moreover, CR provides a testing environment which can be particularly difficult with very large systems without a strong sharing-based architecture.

CR, like Blue/Green deployment [Fowler 2010] is based on 2 sets (environments) of servers:

  • an environment “N” with all servers configured with all components at version N
  • another environment “N+1” with all servers configured with components at version N+1

A router opens the “N+1” environment to few Users who will contribute to the testing effort as they use the new system. This mechanism keeps the overall User satisfaction since most Users are assigned to the stable “N” environment.

Canary Releasing [Humble 2011]

The router usually selects “power users” to hit the environment “N+1” such as Testers, geographic zones or Beta Testers [Lee 2020]. This technique enables [Humble 2011]:

  • Rolling out Users from environment “N” to “N+1” by increasing the amount of Users into the “N+1” eventually combined with smart Feature Toggles to select the category of Users
  • Rolling back a bad version simply by stop routing Users towards the bad version
  • Providing a A/B testing flavor to make Users evaluate different implementation from their usage [Young 2014] [Moustier 2019-1]
  • Checking if the application meets capacity requirements by gradually ramping up the load

CR also eases Blue/Green Deployment (BGD) set up by adding more servers (microservices or components if the solution is too small) into the “N+1” as the amount of Users increases. Due to the similarities, CR is often mixed with BGD because of the rolling out deployment feature but, while CR proposes a progressive build of the solution, BGD migrates Users to a whole version of the solution [Bryant 2018].

Impact on the testing maturity

Behind this idyllic picture, CR is not for everyone due to the infrastructure needed and to the upgrades of resources such as databases [Humble 2011] or remote devices.  CR implies therefore the use of strong architectural paradigms such as grid computing, immutable servers [Morris 2013] or a  “share nothing” approach. 

Another point of attention is the amount of environment. Once understood, there may be more than 2 versions working in parallel but this would be difficult to manage [Humble 2011] [Sato 2014] [Lee 2020].

In the same way, combining CR and A/B Testing can be tricky as the objectives are not the same: while CR is designed to detect problems and regressions; A/B Testing is designed to statically choose between options [Sato 2014].

CR is applicable when the solution [Bryant 2018]

  • is composed with microservices, eventually with independent release rates
  • requires a high availability and satisfaction of the Customers
  • relies on a third party component that hardly or cannot be integrated or mocked outside a production environment and need a deployed version to operate

However, CR is not recommended when [Bryant 2018]

  • the solution does not tolerate failures because people’s life or critical assets would be endangered
  • End Users are sensitive to CR due to data constraints (eg. financial transactions)
  • the data new storage is incompatible with the current storage

Agilitest’s standpoint on this practice

If test scripts automation is done with specific people, CR implies a good coordination between Devs & Ops that deliver the solution and the people that automate the scripts otherwise, false positives will be raised notably if routing is not done accordingly. To lower the amount of risks generated by CR, Devs are involved in test automation and usually provide scripts with the same technology they are acquainted with: Java Developers will probably use Java + Selenium to automate their tests. Unfortunately, such Selenium-based scripts hold issues in the code which generates false positives and over costs that go along.

Agilitest is a #nocode Seleniumless-based tool which sole issues are located only within the scenario. The script is linear and does not hold neither IF instruction nor loops which are error prone. Because of the Developers’ culture, they see this as a lack of feature while this is actually a feature. This mindset introduces a silo which pushes agilitest use mainly at business level, creating thus possible coordination issues…

To go further

© Christophe Moustier - 2021