When experimentation becomes an anchor: 5 signs your program has stopped learning

Setting up an effective experimentation regimen is usually the hard part. Maybe that’s why businesses so often take their foot off the gas once it really gets going.
It’s a kind of quiet failure, one where the program looks perfectly healthy, with a steady cadence, full dashboards, and an animated team.
Usually those teams are filled with hardworking individuals building complex features, but none of them can actually discuss what they’ve learned this quarter because the whole process is on autopilot.
Paul Davidson, an engineer and product leader with experience at Expedia and National Instruments, calls this the anchor, a program that weighs down the teams it’s supposed to strengthen.
In his recent chat with Katie Green on Unite Voices, he noted there are ways to check the health of your A/B testing program, and ways to solve each problem as they arise.
1. Your tests change too much at once
Running multiple tests at a time can be a strong way to increase the efficacy of an experimentation program. However, when a single test bundles multiple changes, the result is seldom informative.
“Some of those changes might be pushing your goal up while others push it down,” Davidson notes. “It’s really hard to untangle what actually happened.”
When your tests bundle multiple changes, you can win and have no idea why, and without that information, you cannot repeat the victory. If your readouts are leading to debates about what exactly caused the win, the scope of your tests is much too wide.
The solution is simple: test fewer things at a time to make sure you can isolate what exactly is moving the needle.
2. Validation comes in too late to matter
Once a feature is built, teams often feel pressure to justify the effort. After all, by that point, you’ve sunk the cost in dollars, hours, or both, and abandoning that progress is hard to do.
So when do you test new features? Davidson says to test them early, as early as possible, even if the work is not yet perfect, because a test that fails in a couple of days can save weeks of time and effort.
Teams who test too late in the process, Davidson says, are often just “trying to save the test,” digging through results and “looking for anything that would justify a rollout, because no one wanted to go back through that process again.”
3. Your metrics list keeps growing
Your list of secondary metrics can be a good indicator of your program health. Over time, secondary metrics balloon to ten or fifteen often contradictory items. When the program scope expands so significantly, what often happens is every readout becomes an argument.
A healthy experimentation program offers a clear read on every test by choosing the metric closest to thechange paired with one or two guardrail metrics and treating everything else as informational only.
4. Experimentation becomes a rubber stamp
How are decisions actually made? Which comes first, the test or the decision? Teams that run tests to confirm decisions they (or someone else) have already made are seldom learning anything from those tests.
When this happens, Davidson describes subsequent testing as “just the thing you do before you go live.”
Experimentation is meant to be a way for teams to learn. Tests that only run long enough to reach statistical significance on pre-made decisions do no good for anyone.
5. Your team thinks the platform is the problem
Blaming the platform is a comfortable framing for the above four problems. If experimentation has become an anchor for your team, the easiest, safest possible story is usually that the tool is missing something.
Sometimes it is. Davidson’s experience, however, tells him that, for “about 70 to 80% of our use cases, the platform was pretty much complete and ready to go.”
By those numbers, the constraint is usually not the software, but is far more often in how teams scope their tests, when they run them, and which metrics they measure.
Lifting the anchor, one team at a time
The three levels that manage every experimentation program are platform, process, and people. Of the three, two can be adapted and managed today.
So don’t try to fix the whole program at once. Find the team that “feels the anchor” most strongly and work the fundamentals with them: fewer changes per test, fewer metrics, and more sensitive metrics.
Once you win there, the rest of the organization will want what that team has.
{{cta-block}}




Want to hear more? Paul Davidson discusses testing strategies, platforms, and how to glean clearer learnings with fewer metrics on Unite Voices, Kameleoon’s podcast featuring real stories from the people behind today’s most innovative experimentation programs.
Want to hear more? Paul Davidson discusses testing strategies, platforms, and how to glean clearer learnings with fewer metrics on Unite Voices, Kameleoon’s podcast featuring real stories from the people behind today’s most innovative experimentation programs.



