The Practical Guide To A/B Test Prioritisation – Why, How & Frameworks
Michael Aagaard, a well known Conversion Expert who now worked at Unbounce is a reformed split test junkie (his own words). In the past he’d claim that one should test everything. Running A/B tests can be fairly empowering. After all, you are in control of opening the revenue floodgates for your business or your client. There are still optimisation consultants and teams that are of the opinion that you should test everything – all web pages and all ideas. After all, the faster you can test, the faster you can get ahead of your competitor.
Michael has since moved on from being a split test junkie and is now a Conversion “plumber”. (also goes by the Conversion Viking amongst other names)
The “Test everything” ideology perpetuated by CRO experts back in 2011-2013 is a myth that needs to be busted. It’s not practical.
1) A/B testing is expensive.
There is a cost attached to every test you run. There are hard costs such as cost of development which can be significant but there’s also soft costs such as opportunity and risk costs, time taken to plan and deploy. The cost of an AB test needs to be factored into your returns and the calculation needs to be factored into your returns.
2) Testing takes time
Every experiment you create has to run for a certain amount of time. This is generally 2 weeks or more to be ready for analysis depending on your pre-test calculations.
Hopefully, your CRO team isn’t stopping tests based on knee jerk reactions such as reaching significance.
Whichever way you look at it, you only have a certain limited bandwidth to run experiments and each test you run reduces that bandwidth.
3) Testing doesn’t work without a strategy or roadmap
A/B tests need to be planned thoroughly as part of the strategy that the business has outlined. Testing using a “spray and pray” approach may reveal some winners but it will hamper your long term success of your testing program.
Often, you will also have your tests come up against the needs to a stakeholder who will push back on your test ideas
4) Running low value tests has an negative impact
Your backlog is bound to have a few tests that are low value or low impact but without a proper structure in place, you could end up running these reducing your bandwidth.
This is where Prioritization comes into play
Prioritisation is key to keeping a tight leash on your testing idea backlog.
You need a clear guidelines so you can pick what ideas to test and what ideas to leave behind. This is what a structured prioritization process helps with.
There have been many industry frameworks to help with prioritising your experiments
The most common ones used are PIE, ICE and PXL .
PIE (Potential / Important/ Ease) was set up by Chris Goward at Widerfunnel. It’s the most commonly used model by organisations starting out in the practice of prioritisation.
ICE (Important / Confidence / Ease) was set up by Sean Ellis to plan his growthhacking experiments.
Both PIE and ICE use a range based prioritisation model where the user sets a score between 1 an 10 for each variable in the model and the resulting final score is the average of the 3 variables.
PXL is a prioritisation model set up by ConversionXL which uses a weighted model and an objective approach to scoring.
The user has to pick answers for objective questions such as “is it above the fold” , “was this idea the result of user testing etc” and the score is based the answer the user provides.
The total score is a sum of all the variables.
Effective Experiments, the CRO project management platform allows you to set default prioritization models such as PIE, PXL or ICE. You can also create your own unique model that is aligned to your experimentation program.
Which Prioritisation Model Should You Choose?
Whilst I would like to remain objective, I firmly believe that any organisation running a robust experimentation program should not use PIE or ICE
Yes, It’s easy and a very simple entry point into prioritisation and it’s better than not prioritising at all. These are generally the justifications given by teams using these models. What PIE an ICE don’t factor in is the complexity of the tests and the office politics that you encounter when running an experimentation program.
Let’s look at their variables again
Importance – Importance of the page you’re testing on
Confidence – How confident are you the test will succeed
Ease – How easy is it to create the test
Potential – The potential of the page you’re testing on.
Impact – The impact of the test should it win
Ease – How easy is it to create the test
They both are pretty similar in terms of the individual variables.
The models themselves open up Pandora’s box when it comes to scoring. Any score given to these variables will differ depending on the person you speak to. Situations will arise where the scores are too close to each other. For example, what’s the difference between a score 5 and a score 6 for Ease. How can you accurately judge and place a score on your confidence of a test succeeding?
These subjective scores will harm your test prioritisation and worst still, make you launch experiments that are not going to move the needle in the grand scheme of things.
The PXL model does a better job with this since all of the scores are objective. Anyone scoring the variables should theoretically come to the same conclusion.
Let’s look at some of their variables – “Is it above the fold”.
This should be simple and straightforward to answer. It’s either a yes or no. All the questions in this model follow the same pattern.
Off the shelf or create your own?
My recommendation here is to always create your own but use a model as a starting point.
The variables in this model should be closely aligned with your business and processes.
PXL is good framework that you can adopt and adapt each variable according to your requirements.
Here are some of the factors to consider – Risk , Cost of the test, etc.
Things to remember
1) Don’t run experiments without proper prioritisation – Backlog prioritisation is important. It ensures that everyone plays by the same rules and no one can override to push their idea further ahead
2) A/B Testing Prioritisation must be a group activity especially when the variables are tied to different individuals or teams. You simply cannot prioritise alone. For example, you should not be prioritising the “Ease of creating an experiment” because unless you’re developing it, you shouldn’t be calling the shots on it. Come together for a weekly prioritisation session or allow different team members to prioritise their assigned scores on their own time.
3) Do not use silly variables such as “Love” in your prioritisation model – I’ve seen some frameworks have a Love score where users pick a number in the range to show how much they love an idea. This is utter nonsense and a gimmick that doesn’t shed a good light on experimentation and worse, can make a HiPPO override your prioritisation.
4) Re-prioritise regularly – Your test idea backlog should never remain static. Business priority, strategy, availability of developers can change and it is up to you to ensure that you are able to stay agile.