One of the most important things to get good at for a software teams is getting the things they make into the hands of customers faster, more often, and with less drama. While new features and better code are both great, if the release process is disruptive, and people start to associate releases with bugs and poor quality, then the team’s efforts are undermined.

Now, it’s obvious but I’ll say it anyway, there’s no substitute for a robust quality strategy. This doesn’t mean going nuts and spending the whole time turning testing into a form of self-flagellation, and obsessing over tests for the sake of it – quality is about far more than just testing anyway, and that’s a whole other topic. The point here is that once you have something that you are happy with from a quality perspective, how can you help the rollout to customer go more smoothly?

Feature flags

Feature flags, or feature toggles, have become a well-known and popular approach to rolling out features lately. The idea here is fairly simple, but getting it right isn’t guaranteed! A feature toggle is a setting somewhere – normally a simple true/false for on/off – that can change the behaviour of the system without a deployment. This is often done in one of two main ways – by changing a value in the application’s configuration, or by using a dedicated service outside the application. The code will usually contain “if” statements, altering the flow based on the value of the flags. With a simple feature flag, the flow changes for everyone – it’s all on or all off. Something like “if feature X is enabled do this, otherwise do that”.

Feature flags are great for altering the flow, but there’s some things to be cautious about. Firstly, every “if” statement in any code creates more paths through it – and a robust quality control approach would mean making sure all paths are behaving correctly with no adverse effects. The more branches of code, the more complexity there is, by definition, and the more things there are to test and maintain. Feature flags should be short-lived, removed as quickly as possible, to keep the complexity as low as you can.

The other classic trap with feature flags is where there is an irreversible change being assumed to be toggled. An example of this that comes up often is something like a database schema change, where actually the schema change can break things regardless of what the flag is set to. Feature flags can give a false sense of security, and understanding the potential risk and impact of changes is still essential! A variant of this is where changing the toggle in one direction works fine, but you can’t change it back – there is something that prevents backwards compatibility. Really, you will need to test the system with the flag off, test turning it on, and then test turning it back off, so that all the states and all transitions are covered.

Pilot

Under a pilot scheme, features are rolled out first to a small set of friendly customers, who usually know that they are the first to get features and happy to give feedback on them. Setting up a pilot is similar to a feature flag, except that its not a simple “on/off” but more a question of “on for this person”. This means that while a feature is being rolled out, some users will get different behaviour to others. There is a support consideration here – whoever is helping your customers will need to know what each specific customer is seeing. It needs to be easy to check whether someone is in the pilot scheme or not, as part of diagnosis.

Like a feature flag, a pilot scheme is likely to be implemented in code as something like “if this person is in the pilot, do this, otherwise do that”. You can of course combine feature flags and pilots, which gives different control mechanisms.

A/B testing

In an A/B test, different configurations are expected to give a different outcome. This is great for “fuzzy problems”, where you can’t reliably predict which will “win” in the real world. This might be things like “do people buy more stuff if the button is blue or if it is orange” where the only way to know is to try both.

You would want to make sure here that there are good metrics in place, so that you can get data on which variant gives the preferred outcome. You’ll also need to have a reasonable way to divide users into the A and B groups as fairly as possible so that the test is representative, and will need to collect enough data for a result to be significant. A common mistake in A/B testing is to do tests on rarely used features, with a small customer base, or for a short time, where all of these make the sample size too small to get a meaningful answer. You would expect to run an A/B test for a longer time period, to collect data, which could be weeks, before evaluating the result and picking a winner.

Cohorts

Cohorts are groups of users. They are similar to a pilot scheme, but often for larger groups. The idea here might be to try a feature first with a small group, who are either the most likely to get the most benefit, or with the group who are most likely to have an issue so you can find and fix it before rolling out globally. You might say that a user can be in only one cohort, or it might be possible to be in many. You can choose how to populate cohorts – they might be based on some kind of trait of the user, like “regular users” or “friends and family”, or they might be more arbitrary.

Like a pilot, with cohorts you will have multiple active branches of code, and select one based on the user’s context – so you’ll need an easy way of seeing who’s in what group, and it needs to be deterministic. The same testing considerations would apply – each branch adds more possible states. For cohorts, you might need to consider things like interactions between users, and would probably design the boundaries based on this. For example, users in the same company might find it strange if some of them get a feature a long time before others, whereas users in different companies would probably not even know. If you have a multi-user system this is particularly pertinent – if your colleagues create data you can’t interact with, because you’re in a different cohort, this would be bad.

Phased rollout

In a phased rollouts, a feature would gradually be released to more users, probably one group at a time. Most likely you’d start with a small group, and monitor closely, before increasing the group sizes and rolling out quickly on the back of seeing the first groups go well. The aim here will be to go to 100%, in a controlled way. This is particularly useful when there are questions over scale or performance, because if you see a trend emerging you can halt the rollout and respond.

There’s some very simple ways of doing a phased rollout; for example if you have integer IDs on your users, you can divide by 100 and use the remainder as an approximate percentile. If you wanted to rollout to 25%, then you would divide the ID by 100, and if the remainder is 25 or less then they get the feature, otherwise not yet. This saves you the hassle of populating something more complex, and is deterministic and easy to understand.

Canary

A canary release is more about checking deployment health; it’s something that is built into the (hopefully automated, as it’s 2020…) release pipeline, not code. If you have multiple load-balanced servers (you should!), you might run a canary release by upgrading the software on one instance first, and monitoring for a short period. If there are issues, you can roll back, and the impact of the issue was reduced to a smaller subset of users.

Canary releases are typically in-flight for minutes. After that observation window, the deployment to the remaining instances would usually happen fairly quickly. Ideally the initial deployment, monitoring, and rollback or rollout are all automated with no manual intervention needed. If this is the case, then it might be something that you just always do as a standard practice.

Summing up

There’s plenty of techniques out there for controlling software releases, that can be combined to give a lot of flexibility. Remember though that this comes at a cost of adding complexity – and that you’ll need to make sure that your release mechanisms are robust and easy to use when you need to. By putting a little effort into tailoring your release strategy to your needs, you will hopefully be able to release software improvements to your customers more often, and with confidence!

Share This