Skip to main content
Experimentation is not one method

Experimentation is not one method

May 16, 2025
Collin Tate Crowell
Collin Tate Crowell
Collin Crowell is the VP of Growth for Kameleoon, North America. He’s based outside of Vancouver, Canada.

Too many teams still treat experimentation like it only works one way.

Run an A/B test. Wait for significance. Then decide.

That mindset is outdated and it is holding companies back.

In the 2025 Experimentation-Led Growth Report by Kameleoon, high-growth companies show us a different path. They have built testing cultures where every team can experiment in ways that fit their goals, constraints, and stage of maturity.

Marketing is no longer shackled to "stat sig or it doesn’t count."

And product is no longer excused from showing results.

The rules are being rewritten.

More and more marketing teams are now choosing the level of rigor that fits the moment.

Product and engineering teams are being increasingly asked to connect their work to real outcomes e.g. revenue, activation, and retention.

As Leah Tharin puts it, product now has to prove its impact on the business.

That is growth. And growth requires learning.

To support that shift, the best teams work across two spectrums: the type of experiment they run, and the level of rigor they apply.

The spectrum of experiment types

1. Progressive delivery (ensure you don’t break something)

Used by: Product, Dev, Engineering

Purpose: De-risk feature releases

How it works: Roll out a new feature to a small group. Monitor critical metrics. Catch major failures early.

Pros:

  • Fast feedback on technical and behavioral risk
  • No need for large sample sizes or statistical models 

 

Cons:

  • No control group or causal inference
  • Cannot easily prove performance gains

2. Feature flag observation (see what users actually do)

Used by: Product, UX, Data

Purpose: Learn from live behavior without formal testing

How it works: Use toggles to show or hide features for selected audiences. Track how users engage. 

Pros:

  • Flexible release and behavioral insight
  • Useful for UI or adoption feedback

 

Cons

  • Not randomized
  • Can produce misleading results without careful measurement

3. Personalization with a holdout (optimize for certain segments)

Used by: Product, Growth

Purpose: Deliver targeted experiences and compare impact

How it works: Serve content or offers to segments while holding back a portion for comparison.

Pros:

  • Matches real-world use
  • Enables segment-level learning

 

Cons:

  • Control group may not be truly comparable
  • Hard to generalize to all users

4. A/B testing (Did this change work? And by how much?)

Used by: Growth, Analytics, Marketing, Product

Purpose: Prove impact with high confidence

How it works: Randomly assign users to test and control groups. Use statistical analysis to compare results.

Pros:

  • Clear causal insight
  • Ideal for business-critical decisions

 

Cons:

  • Slower and more resource-intensive
  • Overkill for small or obvious changes

The spectrum of rigor

Not every test requires statistical significance.

Insisting on it in the wrong context can block progress, especially in marketing, where timelines are short and signals are weak.

Here is how leading teams adjust their level of rigor:

 

Low rigor

Medium rigor

High rigor

Use for:

De-risking releases

Personalization or feature engagement

Major product or business decisions

Goal:

Catch obvious problems fast

Identify trends and guide direction

Prove causal lift

Example:

A new flow causes a 70%  drop in conversions. That is enough to roll it back.

Users who received personalized messaging showed stronger repeat behavior over time.

A new onboarding sequence increases trial-to-paid conversion with statistical confidence. 

Why some teams use the spectrum—and others don’t

Product teams have given themselves permission to experiment with flexible levels of rigor. Frameworks like Itamar Gilad’s Confidence Meter have helped normalize the idea that not every feature needs statistical validation. Small or low-risk changes can move forward based on directional signals, adoption rates, or early telemetry.

Marketers, and most CRO practitioners, on the other hand, are often told they must A/B test. They are held to a higher standard of validation, even when the changes are small and the signal is weak. If their test doesn’t hit 95% confidence, it’s considered invalid. If it’s not statistically rigorous, it’s dismissed. Teams spend hours arguing over complex statistical methods rather than considering the directional trend.

This isn’t about lowering the bar for marketing. It’s about applying the same judgment to all teams: match the rigor to the question. If product gets a spectrum, so should everyone else.

How to put this into practice across teams

The spectrum only matters if teams use it. That means understanding how to apply the right type of test with the right level of rigor, not just in theory but in your actual day-to-day.

If you’re in marketing

You can experiment without always running a full A/B test. Here's how:

  • Use AI tools or visual editors to launch changes on high-traffic pages or segments
  • Collaborate with developers when you need to go beyond basic edits
  • Compare behavior before and after, or across audience segments
  • Track directional metrics like bounce rate, scroll depth, or lead quality
  • Set thresholds for success or rollback before launch
  • Tie results to KPIs like revenue per visitor or downstream conversion

 

This is experimentation focused on outcomes, not statistical perfection.

If you’re in product

Don't just ship. Don't experiment with low rigor. Now it is time to show business impact. Here's how:

  • Start with a clear KPI such as activation, retention, or monetization
  • Use holdouts, cohort tracking, or A/B tests for your biggest bets
  • Work with UX and design to validate assumptions before launch
  • Do not stop at usage metrics. Connect your work to business outcomes
  • Show how your feature changed behavior or supported company goals

 

Business alignment comes faster when your experiments are measurable.

If you’re working together (congrats!)

Companies with aligned marketing and product teams are more than twice as likely to report significant growth. Go outperform. Here's how:

  • Use a single platform to see all product and marketing tests in one view
  • Track how experiments in one area affect results in another
  • Rely on cross-campaign analysis to interpret overlaps and interaction effects
  • Do not pause tests by default. Use data to learn across initiatives
  • Build a feedback loop between teams to scale insight, not just output

 

Growth happens faster when teams learn in sync, not in silos.

Experimentation is not one method.

Testing is not a rigid process only analysts can own.

It is how teams learn and improve, across goals, tools, and timelines.

Sometimes that means full statistical rigor.

Other times it means catching problems fast or observing real-world behavior.

The smartest companies know how to choose the right approach based on what they need to learn.

Your goal determines which experiment type to use

To help you adopt how high-performing companies use experimentation, we’ve included a simple guide to match your experimentation goals with the right method. Whether you want to de-risk a release, understand adoption, personalize experiences, or prove impact, this guide can help your team act with confidence, not guesswork.

Experimentation decision tree

What is your primary goal?

Reduce the risk of a release or catch technical issues early?

  • Experiment type: Progressive delivery
    • Release to a small group of users
    • Watch for errors, slowdowns, or major conversion drops
    • Rigor: Low

 

See how users behave before a full rollout? 

  • Experiment type: Targeted observation
    • Show the change to a specific audience segment
    • Track usage patterns, clicks, or flows
    • Rigor: Low to Medium

 

Personalize content or features for a specific group? 

  • Experiment type: Personalization with a holdout
    • Deliver the experience to most users in the segment
    • Keep a small portion on the default for comparison
    • Rigor: Medium

 

Prove that your change is driving better results?

  • Experiment type: A/B testing
    • Randomly assign users to different versions
    • Measure clear impact on key metrics like conversions or engagement
    • Rigor: High
Image
Experimentation decision tree
Topics covered by this article
Collin Tate Crowell
Collin Tate Crowell
Collin Crowell is the VP of Growth for Kameleoon, North America. He’s based outside of Vancouver, Canada.