Harder to stop a moving train: Inside FOSSIL's high velocity testing program

Katie Green x Marcela Gutierrez

Also available on:

home

EPISODE

Harder to stop a moving train: Inside FOSSIL's high velocity testing program

Katie Green x Marcela Gutierrez

Also play on:

Published on

May 21, 2026

About the episode

Stopping someone who already has a finished experiment is a lot harder than saying no to an idea on a whiteboard. Marcela Gutierrez figured that out, and it changed everything about how Fossil Group runs its experimentation program.

In this episode, Marcela shares how Fossil shifted who builds experiments and why that single change unlocked speed across the entire team. She also breaks down how her analysts think about metrics, storytelling, and why human judgment matters more in experimentation programs as AI takes on more of the build.

‍

About our guest

Marcela Gutierrez leads digital analytics, experimentation, and consumer behavior at Fossil Group, where she has spent 13 years across the US and Switzerland. She is the kind of analyst who thinks just as hard about how data gets communicated as she does about what the data says.

‍

No items found.

Key takeaways

Building a test before asking for approval changes the conversation. Coming to a sprint review with something nearly finished is harder to stop than rejecting a rough idea, and Fossil uses this dynamic to run more experiments with less friction.
A primary KPI needs a secondary one. Marcela's team pairs each main metric with what she calls its "BFF" metric, so results tell a coherent story rather than just a flattering one.
Analysts are not data processors. The most valuable work they do is understanding organizational context, cleaning data before it goes in, and translating insights for the right audience at the right level of detail.

Welcome and Introductions

‍

Katie Green: Marcela, welcome to Unite Voices! I'm so happy we got to meet in person at Unite Vancouver — I feel like we're already buddies coming into this podcast. Thank you so much for being a part of our show.

‍

I want to give you a chance to introduce yourself. You had such a great presentation at Unite Summit Vancouver, and I'd love for listeners to know who you are before we dig into the details of how you're running your testing program.

‍

Marcela Gutierrez: Thank you, Katie — it's an honor to be a guest on your podcast. I had such a pleasure meeting you in Vancouver. You have this beautiful quality that makes anybody feel at ease, and I'm really looking forward to having a great conversation.

‍

Quick intro: my name is Marcela Gutierrez. I was born and raised in Honduras in Central America, but I've lived much of my life in the US, a little bit in the Philippines, and now I've been living in Switzerland for almost eight years.

‍

I've been with Fossil Group for 13 years — a career that has spanned both the US and Switzerland. My role here is to oversee digital analytics, experimentation, and consumer behavior. Really understanding who our consumer is for Fossil. Happy to be here.

‍

Katie Green: I didn't realize you lived in the Philippines for some time! My dad is from Manila, so we are going to have to talk about that separately. There's some really good Filipino food in Portland, Oregon — that'll be a whole separate pod. My food pod, when I start that.

‍

The Analytics Translator: Delivering Truth in a Beautiful Way

‍

Katie Green: I want to dive deeper into your role at Fossil, because I saw something on your LinkedIn that really stood out to me — you describe yourself as an analytics translator. A lot of people listening to this podcast work in experimentation but aren't analytics engineers. They're still responsible for translating data into something actionable. I'd love to hear what that means in the context of a global brand like Fossil.

‍

Marcela Gutierrez: I think if you play a technical role — or even if you have deep expertise in anything — you need to become good at communicating to whatever audience you're speaking with. My message about any given project will be delivered in a very different language depending on whether I'm speaking to analysts, marketers, salespeople, or a board of directors.

‍

With analytics and experimentation, things can get very technical very quickly. You can come across as impressive by using complicated language, or you can come across as simply easy to understand. That's how I challenge myself — making sure my team and I are very good at taking complicated topics and making them easy to digest and easy to act on.

‍

Funny enough, it's even in our team mantra. We have a little mantra we put together a few years back: always deliver the truth in a beautiful way.

‍

What this means is that we are the analytics team — a neutral team — and we will always deliver the truth. People come to us with questions, with problems they want to explore, with opportunities. It's our job to look into that and deliver facts.

‍

But "in a beautiful way" encompasses several things. It means being respectful of the other person's point of view, knowing if they might be biased, and delivering difficult news in a way that can be easily understood and acted on. And it means presenting information visually — so it can be captured and understood at a glance, not buried in formulas and walls of text that are going to make somebody go to sleep.

‍

Katie Green: I love that. A lot of people are going to take that away from this episode — how to communicate complicated things, not just hard things. And I love that you said you're still working on it. That shows the mindset of an experimenter: constantly striving for better, constantly improving. That's classic to a T.

‍

How PBX Transformed Fossil's Experimentation Velocity

‍

Katie Green: At Unite Summit Vancouver, you talked about how you're leveraging PBX and AI to speed up your testing velocity. You did such a great job explaining it there — I'd love for you to dig into it here. How are you using PBX? How are you leveraging AI to increase velocity, increase learnings, and democratize how you're sharing information?

‍

Marcela Gutierrez: We at Fossil do a lot of experimentation, and we've been at it for several years — across our different brand portfolios and across regions. PBX has truly transformed our experimentation program. If I look at where we were four years ago versus where we are today, the difference is significant, and a lot of it is down to PBX.

‍

With PBX, we've been able to tackle an area that used to be a bottleneck for us. Ideation has never been a struggle — we come from a group that is very outspoken, and we love it when people share ideas. But the issue was always finding dev capacity to actually build and launch the tests we wanted to run.

‍

With PBX, analysts and members of my team can now take a lot of the ideas in our pipeline and do them ourselves, which lightens the load on developers significantly.

‍

There's also a momentum element to this. When you come to a sprint review with an idea that's already built and ready for QA, it creates a very different conversation than walking in and saying, "Hey, I have this idea — should we work on it?" Stopping progress is so much harder than starting from zero. When someone shows up with work already done, the team responds: "Let's run it. Can we tweak this? Can we adjust that?"

‍

We still want those voices to have a say. But it has absolutely freed up the dev team to spend their time on projects we know will have major impact — rather than spinning on all the small ideas we just want to test quickly.

‍

What PBX Can (and Can't) Do: Easy, Medium, and Hard Tests

‍

Katie Green: A question I often get is: are these just little tests? Is PBX only helping with things that were probably going to win anyway? Can you speak to the range of complexity you're actually building with PBX, and what level of effort is involved?

‍

Marcela Gutierrez: Let me go through different levels of effort.

‍

The simple ones — things like hiding an element or rearranging existing features on a page — these are the bread and butter of PBX. They can be done with almost one prompt. Easy to do, easy to execute.

‍

The medium ideas involve different pages, different flows, or creating something new — like adding an "add to cart" button in a recommendation zone. With PBX, we've found it can handle many medium ideas up to about 80% completion. Then the dev team can come in and finish the last bit. It still dramatically reduces developer workload.

‍

For the harder ideas — those that touch different systems, like our product management system or content that lives elsewhere — PBX isn't going to go there. Those remain harder lifts and still need the full team.

‍

But here's what's been interesting: some of those "easy" ideas had been sitting in our pipeline for a year because, even though they were easy, they were considered low priority. With PBX, because they're already ready to go, we can slip them through. "Hey, we have this idea ready — does anybody have any objection to running this test? No? Let's go." And we've run a lot of tests this way.

‍

Katie Green: That's a really important point. In my career in CRO, the little tests can have just as big an impact on your bottom line as larger ones — especially because they compound. Changing a button color, reordering content, changing a headline — over time, you end up with a dramatically different page. And PBX lets you run faster on those ideas. It's getting better every day too, so it'll be the blink of an eye before it's handling your most complicated tests — at which point your dev team can focus entirely on implementation.

‍

The Compounding Impact of Tests and Measuring ROI

‍

Katie Green: I want to make sure it's not just a hamster wheel — run fast, run fast, run fast. Can you speak to the actual impact on your bottom line, and how you're measuring it as a result of the velocity increase from PBX?

‍

Marcela Gutierrez: These tests absolutely accumulate over time. And I think about winning and losing tests differently than some people might. When we have a losing test — meaning the hypothesis we set didn't match what actually happened — we still learn an enormous amount.

‍

We have meetings today where someone will bring up a test from three or five years ago that was a "failure," because we don't want to make that same mistake again. We learned something about our consumer. And ultimately, that's what we're trying to do with all of these tests: understand our consumer, understand their preferences. When we run many small tests, we're learning consumer behavior, and that knowledge aggregates into better decisions, better features, better UX down the road.

‍

We do put an estimated ROI on each test — what the impact would be if we implemented the result, or the opportunity cost if the test failed and we didn't launch without testing, how much did we save? It's more of an internal check than a hard number, but it shows the value of the program and the effort we're putting in.

‍

Katie Green: That risk assessment piece is huge — understanding not just the impact of what wins, but what could have been launched and lost. A healthy program measures both. Some teams I've worked with focus almost entirely on ROI and miss the opportunity cost and risk evaluation side entirely — but that's where you really start to paint a picture of what your users want.

‍

AI for Ideation: What's on the Horizon for FOSSIL

‍

Katie Green: PBX has grown a lot since it launched. We started with PBX Build, then came PBX Ideate — which generates and prioritizes test ideas based on page data — and now PBX MCP, which removes the separation between experimentation and feature flagging. Are there other AI tools you're using for ideation or insight analysis? Are you feeding learnings into a custom GPT, for example, to generate ideas elsewhere in the workflow?

‍

Marcela Gutierrez: For experimentation specifically, not quite yet. But I saw Drake's demo at Unite Vancouver, and I'm genuinely excited about PBX Ideate. We're actually in the process of getting access — trying to add it to our contract — because I think it's going to be a big game changer.

‍

What excited me most about the demo, even without hands-on experience, was the behavioral economics angle: the science behind why something is being recommended. In one example, Drake showed that on a product listing page, you should display four products per row instead of three — because giving consumers more choice helps them move through the funnel. Having those concepts explained is such a powerful thing.

‍

And if I remember correctly, it also attached an estimated ROI or expected impact to each recommendation. That's really compelling. We're hoping to get access in the next few weeks.

‍

Katie Green: That's great context for people listening who are still figuring out where to start with AI. I also did a webinar with Drake that went deep on the details of PBX Ideate — the consumer psychology behind it, how the recommendations are built, how they're prioritized. I have a few clips on my LinkedIn and a full write-up as an article, so feel free to check those out.

‍

KPIs Done Right: Primary Metrics and Their BFFs

‍

Katie Green: Something you focused on at Unite Summit Vancouver was primary and secondary KPIs. I'd love to hear your best practices around metrics — and where you see the common mistakes happening, especially as teams move faster with tools like PBX. What's the next bottleneck once you've sped up the build?

‍

Marcela Gutierrez: Our experimentation program has changed so much over the years. I challenge my team constantly: we know as much as we know today, and hopefully tomorrow we'll know more. The moment we do, we pivot, and we're transparent with the business when we do.

‍

Metrics is a great example. When we started the program, we were session-based. I remember hearing someone speak about how session-based measurement was actually hurting experimentation programs — because a successful test would bring back returning consumers, but if you're not measuring at the user level, the math wouldn't add up correctly. We changed our approach immediately, and the difference was clear.

‍

On KPIs: we always set the hypothesis before the test begins. And our primary metric isn't always conversion rate. If we're testing something higher in the funnel — like filters on a category page — my primary KPI is interactions with the filter. My secondary KPI is whether users are viewing more products. I want to corroborate the story.

‍

I think of the secondary metric as the primary metric's BFF. If I'm testing a marketing campaign, my primary might be click-through rate — are they coming to the site? My secondary might be bounce rate — are they actually staying, or just landing and leaving?

‍

You want to make sure the story holds together. If you change something on filters, you might drive more filter interactions — great! But did those users then view products? Did they move down the funnel? And the math matters: if you look at step three divided by step two, you might be inflating your metric. But if you divide step three by step one, you get the real story — did you actually generate lift overall?

‍

Katie Green: I love that framework — the closest metric to the change, then the next step, then the next. It builds a real story arc. And that's where human judgment comes in: knowing which metrics matter to your leaders, your team, your OKRs. AI can surface trends, but it takes a person to know what's meaningful.

‍

Where Humans Still Win: Creativity, Context, and Communication

‍

Katie Green: That brings me to something I've been wanting to ask. With all this AI acceleration — PBX for building, AI for ideation — where do humans still have the most impact in your experimentation program?

‍

Marcela Gutierrez: In everything.

‍

AI is a tool. I went to an AI conference about a year ago and heard someone claim they could fire 80% of their analysts right now. I couldn't believe it. Either they don't fully understand what their analysts are doing, or their analysts are spending too much time on manual tasks — in which case, they're being used incorrectly.

‍

When I look at my team of analysts, what they work on isn't manual. We need analysts to understand concepts — to understand what's important to each team, to know what's top of mind for a given promotion or inventory situation. AI may be better at identifying trends in a large dataset, but understanding what factors weren't put into the model, or cleaning the data before it goes in? That takes human intervention.

‍

And then there's the storytelling. At the end of the day, if you have a wonderful insight but you can't connect it to the audience, it's nothing. It's just words on a page or numbers in an Excel sheet. If you can't make a marketer, a salesperson, or an executive understand it and act on it — going back to that translation piece — all those beautiful insights are wasted.

‍

Katie Green: You did an incredible job tying it back to where we started. That's very much the through-line of this whole conversation. The organizational context, the emotional intelligence, the ability to know what a leader actually cares about — that's still very much a human domain. At least for now.

‍

Practical Tips: How to Get Better at Communicating Complex Ideas

‍

Katie Green: Last question, and it's the one I ask everyone. Someone is listening — a lead, an analyst, a designer, whoever. What's one tangible thing they can do tomorrow to start getting better at translating complex ideas and speeding up velocity in their program?

‍

Marcela Gutierrez: My best tips all come back to working on your communication — and practicing it deliberately.

‍

One exercise we've done on my team is presentation training: present the same topic to very different audiences. Take PBX. Explain it to a technical colleague. Then explain it to a 10-year-old — make it entertaining for them. Now explain it to your grandpa. Two minutes each. What words do you use? How does the message change?

‍

The best experts can take a complicated topic and make it look simple. When you're a young analyst, you want to show all the work — all the rabbit holes, all the issues you worked through, every intermediate step. Like in school, when the teacher said "show your work." But in practice, your audience doesn't need to see all of that. They need the insight.

‍

The skill you develop over time is learning to simplify without losing truth. And that extends to small details — like decimal points. Does your audience really need to know 40.2%? Or does "about 40%" tell the story just as well? In most cases, they'll remember "40%" — they won't remember "40.2." In some cases, the precision matters enormously. Knowing the difference is the art.

‍

So: practice communicating, watch other people who do it well, figure out what they have in common, and learn from them. That's the skill that will serve you for your entire career.

‍

Katie Green: That's wonderful. And to build on that — the thing I've found most helpful for improving my own communication is recording myself. Recording your QBRs, your ideation sessions. It's uncomfortable. But hosting a podcast is a brutal and extremely effective way of getting better at communicating.

‍

Thank you so much for your time, Marcela. This has been such a great conversation, and I know people are going to be finding you on LinkedIn for more. I can't wait to see you in London and maybe Paris for future Unite Summits. See you soon!

‍

Marcela Gutierrez: Thank you so much, Katie. It was wonderful. See you soon!

Read THE FULL TRANSCript

hide transcript