Skip to main content
valeur-statistique-ab-testing

A/B testing statistical significance: why it matters and how to use it correctly

April 11, 2025
Jean-Noël Rivasseau Kameleoon
Jean-Noël Rivasseau
Jean-Noël is Kameleoon's founder and CTO and today heads the company's R&D department. He is a recognized expert in AI innovation and software development. In his posts he shares his vision of the market and his technology expertise.

Many teams rely on A/B testing to validate their ideas, but few understand the interconnections of A/B testing and statistical significance.

Developing a solid understanding of the statistical significance indicators of your A/B tests is essential to interpreting the results of your experiments successfully and, from there, improving your digital strategy—especially if you're part of a team making data-driven decisions across marketing, product, and engineering.

In this article, we will explore:

What is an A/B test’s statistical significance?

A test’s confidence level expresses its statistical significance—the likelihood that a result is not due to random chance.

Statistical significance is derived from the confidence level, which compares how many visitors and conversions each test variation receives. This usually involves four data points: visits and conversions for versions A and B.

The confidence level is calculated by comparing the original version and the unique variation during an A/B test, and can also apply to a comparison of the B and D variations of an A/B/C/D test. Any A/B testing platform will provide this confidence level for each test. Even when one isn’t provided (on a web analytics solution, for example), it is possible to calculate it using a standard mathematical formula.

At Kameleoon, we support both frequentist and Bayesian approaches, including CUPED-adjusted methods for faster insights​.

  • In our 2025 survey, experimentation leaders were more likely to run advanced experiments using CUPED or Bayesian methods, and were 270% more likely to grow significantly than teams relying only on basic A/B testing​.

The importance of a rigorous statistical approach

Users logically trust the statistical significance indicator provided by their testing solution. In the vast majority of cases, this is the correct approach.

However, the way you interpret the indicator can sometimes be wrong. For example, some users may believe that it’s enough to observe how the conversion curves of the variations evolve. Others practically keep their eyes glued to the confidence level as they try to identify a trend in real-time.

Peeking, however, is a well-known problem in the world of A/B testing, and dramatically increases the likelihood of committing a type I error, or false positive. The best approach is to let the test run its course without any interference.

Kameleoon can help address this with sequential testing, which allows testers to stop a test early without inflating false positives​.

Distinguishing the increase in conversion and significance of the A/B test

The first thing to remember is that it’s impossible to predict the exact increase that a winning variation will provide.

In statistical terms, there is no guarantee that the increase in conversion provided by the test is the “actual” increase in conversion, which will actually be seen when in production.

This is why it’s important to calculate and interpret the confidence level correctly, which represents the percentage of chances of obtaining the same result (that A beats B or B beats A) in the future—in strictly identical conditions (in terms of observations). 

What is the increase in conversions?

As an example, if you have a 15% increase in conversions for your variation and a statistical significance of 99%, this only means that your variation has a 99% chance of outperforming the original.

It does not mean that there is a 99% chance that the increase in conversions will be 15% when the variant goes live on your website or the test is repeated—the actual uplift may vary widely.

How much importance should you assign to increases in conversion rates provided by your A/B tests?

This doesn’t mean that the increase in conversion rates generated by a test is totally meaningless, just that the confidence level doesn’t apply to it. 

This is where the question of sample size comes in: with moderate traffic and conversion numbers, the standard deviation will potentially be very high. With a high traffic volume, it will be limited.

In any event, this lack of a guarantee is not a problem. Remember that actual conversion rates can rise as well as fall, so a 2% increase in conversions obtained in a test can turn into a 10% boost in conversions once that version goes live.

Avoid peeking to ensure A/B testing statistical significance

The dashboards on some A/B testing solutions display confidence levels right from the beginning of the experiment. The problem with this is that the evolution of this indicator over time has zero value and can mislead inexperienced users.

A confidence level is obtained from a set number of observations and represents the percentage of chances that the same result will be obtained with the same number of observations in the future. So if you have a level of 90% that was only obtained over 50 visits, there is some probability that you will obtain the same result... but only over 50 visits.

Through extrapolation, you can then get three different significance levels during a test: 

  • 90% at 1,000 visits
  • 65% at 15,000 visits
  • 95% at 50,000 visits

 

You shouldn’t take the first two values into account, since the last is the only one that is representative of your traffic when live.

From an applied maths point of view, a trend graph of significance over time is meaningless. In reality, the confidence level should only be displayed once the test is over or close to concluding (i.e. has nearly reached the targeted number of visitors), to avoid the entirely natural temptation to regularly look at this figure.
 
This is why you should always run tests on all traffic and not just on a fraction of it. If it makes sense to start the test on a small portion of your traffic to make sure it is running correctly, then it is essential to extend this or you may draw false conclusions about the winning variation. 

Confidence level doesn’t tell you when to stop a test

A confidence level should never be used as an indicator of when to stop a test. Unfortunately, the most natural reflex is to observe this level and to stop a test once it has exceeded a certain threshold—by convention, 95%. In reality, though, this has no statistical value. Usually, it is good practice to set a threshold of visits or conversions in advance, and only then to note whether or not the test is reliable. 

To complete this first look at the statistical value of your tests, find out how to validate your experiments by using an A/A test, and make sure that your traffic volume is sufficient for successful testing.

Achieve statistical significance for all teams with Kameleoon

Many teams interpret statistical significance differently. Marketers may act on a 90% threshold, while product managers wait for 95%. Without alignment, this creates friction. According to a 2025 survey, teams that share metrics and reporting frameworks were significantly more likely to launch impactful experiments faster​.

With Kameleoon’s unified platform, teams can use the statistical method that suits their test, from Bayesian to frequentist to CUPED, while still reporting results in a consistent format. This makes it easier for marketers, product managers, and engineers to collaborate without forcing everyone into the same tool or method​.

Explore how Kameleoon supports your team’s needs, whether you’re in marketing, product, or engineering by booking a demo today.

Run feature experiments and release winning products with confidence. Try for free
Topics covered by this article
Feature Experimentation
Jean-Noël Rivasseau Kameleoon
Jean-Noël Rivasseau
Jean-Noël is Kameleoon's founder and CTO and today heads the company's R&D department. He is a recognized expert in AI innovation and software development. In his posts he shares his vision of the market and his technology expertise.
Your dedicated resource for platform training and certification
Recommended articles for you