ACADEMY/A/B testing training blog
When it comes to statistical significance, the A/B testing world is split into two approaches:
- The frequentist method, based on the observation of data at a given moment
- The Bayesian method, a forecasting approach that involves analyzing prior information.
What are the advantages and disadvantages of each approach for calculating the confidence index of your A/B tests? Do you really have to choose between the two? This article explains the debate and aims to answer these key questions.
1 Bayesian versus frequentist statistics: what are the differences?
Since the Enlightenment, there have been two opposing schools of thought in statistics: the frequentists and the Bayesians.
- Frequentist statistics, which could also be described as experimental or inductive, relies on the law of observations.
- Bayesian statistics, which is theoretical/deductive, enables us to combine the information provided by data with a priori knowledge from previous studies or expert opinions.
Let’s use a simple example to try to better understand the difference between these two approaches.
2 Bayesian approach: benefits and limitations
There are many benefits to the Bayesian approach when you can take perfectly similar past experiments into account. That is why it is used in several fields, such as spam detection. With a prior knowledge of spam, we can determine the probability associated with the number of times a type of word appears in a spam email.
This probability, obtained through past experiments, enables us to consider a particular word as typical of spam. So, the method’s principal advantage is that it can free us from a fixed point and get results as quickly as possible.
Moreover, you don’t need to determine the size of a necessary sample and traffic volume in advance to run a test: the results can be viewed throughout the experiment and are faster to obtain.
WHAT ARE THE LIMITATIONS OF THE BAYESIAN APPROACH
The Bayesian approach starts from a premise that is completely objective in the case of tossing a coin, but becomes subjective when it comes to a user experiment. For A/B tests for example, it is not recommended to take into action the results of previous experiments that were produced over a different timescale and in potentially completely dissimilar conditions. After all, the first principle of A/B testing is to compare two variations in exactly the same conditions, concurrently and not sequentially.
Bayesian statistics deduce the probability of an event by looking at other events that have previously been assessed. In the context of an A/B test, this a priori knowledge can be flawed, being affected by seasonality or just by trends, which can skew the results.
In other words, the risk of detecting a false positive becomes much higher. This is not necessarily a major issue in the case of spam detection, however, it’s much more problematic in the case of an A/B test.
Another disadvantage of the Bayesian method is that it is much more difficult to grasp. Bayesian statistics try to calculate a probability distribution, which is a much more complex concept than a simple confidence index. In the case of A/B testing, this probability distribution is based on conversion gains or losses.
Rationalizing this distribution to the extreme to make it a simple [-0.5%, +2%] type gains interval doesn’t give the marketer enough perspective when reading the results, after all, there’s a wide and significant gap between -0.5% +2%. This is particularly true as in reality the distribution is obviously based on the [-∞, +∞] interval. The cut-off on the [-0.5%, +2%] interval is arbitrary, starting from a threshold at which we judge that the statistical relevance is negligible.
3 Frequentist approach: benefits and limitations
The frequentist method, universally employed in economics and health, has also become the norm in A/B testing. This approach is based solely on the data from tests run in strictly similar conditions for each variation (hence its reputation as a data-driven method).
However, the frequentist method also has certain disadvantages:
- The required traffic volume does not allow tests to be run in all circumstances. Obtaining statistically significant results when we run A/B tests on pages with low traffic can be difficult or take a long time.
- The reliability of the results is only confirmed at the end of the test. You have to be able to resist the temptation of checking the results while the test is ongoing, as the interim results simply aren’t valid.
- As shown by the practice of A/A testing, the risk of obtaining a false positive result remains.
4 Which approach should you choose, frequentist or Bayesian?
Which approach should you choose, frequentist or Bayesian?
One of the most rigorous analyses comparing the frequentist and Bayesian approaches was carried out by the statistician Valen Johnson and summarized in his article published in the Proceedings of the National Academy of Sciences in 2013 (1).
The aim of his frequentist analysis was to explore the data collected so as to identify a significant effect that could only be explained by the hypothesis of the experiment.
His Bayesian analysis compared two hypotheses and assessed the chances that one was true in comparison with the other, by using the data available at the time of the experiment and the information already known about the subject.
His conclusion was that, in the case of a Bayesian approach, the threshold of statistical significance, commonly accepted as being 95%, is insufficient for concluding that the test is significant or not.
In other words, this only further confirms that the choice of the frequentist approach by A/B testing tool providers is valid.
SHOULD WE DISQUALIFY THE BAYESIAN METHOD?
No, because the Bayesian method has significant advantages when circumstances allow. The A/B testing world logically adopted the frequentist approach because its greater accuracy and lesser complexity in terms of reading results easily outweigh the disadvantages mentioned above.
Generally speaking, this question of which method is better, the Bayesian or the frequentist, is subject to ongoing debate amongst experts and extends far beyond the immediate needs of marketing teams. All in all, one method is not better than the other; what matters is understanding the underlying logic of each or seeking advice from someone who is familiar with both.