ACADEMY/A/B testing training blog
Are the results of your A/B tests valid? To find out, start by run an A/A test.
A/A tests enable you to test two identical versions of an element. The traffic to your website is divided into two, with each group exposed to the same variation. Through this you will be able to determine whether the conversion rates in each group are similar and confirm that your solution is working properly.
1 Why run an A/A test?
The goal of an A/A test is to check that your A/B testing solution is correctly configured and that the data collected is accurate.
By running an A/A test, you can see that the results obtained by the two variations are similar. showing an identical conversion rate.
2 How should you interpret the results of an A/A test?
In the great majority of cases, the results are very similar. However, given your test will have a confidence index of 95%, it is possible to obtain relatively divergent conversion results, which would mean that the test declares a ‘winner’ even though the goal is to obtain perfect equality.
This doesn’t necessarily mean that your A/B testing solution is poorly configured. It is much more likely that you’re dealing with a “false positive”, i.e. an increase in conversion rates has been reported even though it doesn’t exist.
Be wary of false positive
With a confidence index of 95%, the percentage of chances of obtaining a false positive is logically 5%. But this figure can be skewed if we look at the results before the end of the test. This is because the confidence threshold set for a test actually applies to its entirety. It is therefore bad practice to look at this indicator before the test is completed – it breaks the rules of statistics.
An example to illustrate the poin
Consider a statistical study that looks at two cities with the goal of identifying which has the older population. The statistical method would consist in establishing two representative and sufficiently sized samples (one per city) and then comparing their average age.
In the case of an A/A test, we would actually select two groups of individuals in the same city. The correct statistical methodology involves using the confidence index that we wish to obtain (95%) to create the size of the sample to test (for example, 10,000 people). If we complete the study using this statistically valid number of inhabitants, no difference will be detected between our groups in the A/A test.
However, if we repeatedly look at the results before the end of the study, the possibility of seeing a false positive increases. With 20 people in each group, there is a high probability that one of the two groups will have a higher average age than the other group (even though it is the same city), because the samples are too small. If we repeat the same test with 20 people in each group, it is very likely that we will once again find an age difference.
This example clearly shows that to obtain a valid result, we need a sufficient sample size, if we look at the test results too soon, we risk obtaining invalid results.
3 What are A/A/B tests?
Another kind of test is gaining popularity: A/A/B tests.
The idea here is to run a standard A/B test but combine it with a second A variation to ensure that the result obtained for the B variation is valid. If the A variation produces no increase or decrease in conversion, that would confirm that the results coming in for the B variation are reliable.
It is perfectly understandable to want reassurance, but combining an A/B test with an A/A test in no way increases the validity of the A/B test because the two results are completely independent.
Statistically, the A/A test can show a false positive in 5% of cases where it will indicate that version A beats version A, which is clearly incorrect. However, this doesn’t mean that the A/B test itself is providing a false positive. If you obtain an increase in conversion rates on variant B there is a 95% degree of certainty that this result will be a “true positive”, even if the A/A test gives an abnormal result.
In other words, while running an A/A/B test may be intellectually appealing, it unfortunately has zero statistical value in determining the validity of the A/B test.
4 In summary
Statistical methodologies are at the heart of A/B testing, and technically results would only be 100% valid if you used an infinite sample, which is clearly impossible.
Statistical outliers may exist, but this doesn’t invalidate the testing practice. As long as your test is run with sufficient traffic volumes and for a sufficient period of time, you will obtain reliable results. An A/A test will help ensure that your testing solution is configured correctly and therefore will give you valid A/B results.