There’s a quote from Harvard Business School’s Stefan Thomke that’s worth sitting with: “The higher up you get, the more you get paid to make tough decisions โ and you want to be the decision maker.” It sounds perfectly reasonable. It also, Thomke argues, is precisely the mindset that blocks organisations from building genuine experimentation cultures.
The tension at the heart of hypothesis-driven organisations isn’t data versus intuition. It’s something more interesting than that.
The gut isn’t wrong โ it’s just incomplete
Here’s the thing that often gets lost in the “data-driven organisation” conversation: intuition isn’t the enemy of good hypotheses. It’s usually where hypotheses come from.
The observation you form when something doesn’t quite fit your mental model. The hunch that a particular customer segment is underserved. The nagging feeling that the metric you’re celebrating doesn’t reflect what’s actually happening in the field. All of these are intuition firing โ and all of them are the raw material of useful experiments.
The pattern worth noting is what happens after the intuition sparks. In gut-driven organisations, the hypothesis quietly becomes a false fact. “Our customers need this feature” stops being a question and becomes a roadmap item. “This discount will increase revenue” stops being a bet and becomes a plan. Nobody checked. The experiment never ran.
McKinsey consultants apparently use the word “hypothesis” more than almost any other โ precisely because the discipline of keeping a hypothesis as a hypothesis, until there’s evidence to prove or disprove it, is harder than it sounds.
What running experiments at scale actually looks like
Universal Destinations & Experiences is a useful recent case study, not because they’re a tech company, but because they’re not. Two years ago their experimentation programme was siloed and tactical โ a few tests running in isolation, no shared learning, no velocity. By this year they were running over 150 experiments annually across web and email, with a 4x increase in test velocity and a cultural shift from opinion-based decisions to what their team described as hypothesis-led collaboration.
The interesting part isn’t the number of experiments. It’s what changed structurally to make that possible: product owners seeking out the data science team rather than waiting for it, leadership tying KRs to experimentation velocity, and teams treating an inconclusive result not as a failure but as a prompt to adapt the hypothesis and run again.
Barry O’Reilly captures this well in Lean Enterprise โ the goal is one-piece flow: changes small enough that you can actually identify the causal relationship between what you did and what happened. Most enterprise decisions don’t work that way. They bundle too many changes at once, too infrequently, with outcomes too diffuse to learn from.
The bit that’s genuinely hard
It’s worth being honest about the organisational change required here โ because the raw blog framing of “accept failure as a learning cost” is technically true but practically understated.
Thomke’s research surfaces a more specific friction: the higher someone is in an organisation, the less comfortable they tend to be walking into a meeting and saying “I genuinely don’t know what’s going to happen, so let’s find out.” That kind of institutional humility is actually quite rare, and it’s not a personality trait โ it’s a structural condition. Organisations that build it tend to do so by making the act of running a clear experiment feel safer than the act of guessing well.
The shift isn’t from intuition to data. It’s from untested opinions masquerading as facts, to tested hypotheses that either earn the status of facts or update the mental model. That’s a meaningfully different thing.
What AI changes about this
As explored in earlier posts in this series โ from the Lean Loop discussion to the PMF frameworks post โ AI is compressing the experiment-learn cycle in ways that matter here too.
The gap between forming a hypothesis and getting signal is narrowing fast. Behavioural analytics that once required a data science team now run automatically. A/B test setup that took weeks is measured in hours. The organisations building genuine experimentation cultures now are the ones who will have the institutional muscle โ the habit, the tooling, the tolerance for inconclusive results โ when the cycle gets faster still.
The infrastructure for being hypothesis-driven has rarely been more accessible. The harder part, as it turns out, has always been cultural.
In your experience, what’s the biggest barrier to running more experiments โ the data infrastructure, the exec comfort level, or something else entirely?
Let’s keep learning โ together.
Share your thoughts