A/B testing vs multi-armed bandits: when to use which

A/B testing and multi-armed bandits get pitched as rivals. They are really tools for two different jobs: learning a clean answer, versus earning the most conversions while you learn.

What A/B testing optimizes for

A classic A/B test holds a fixed split, say 50/50, until it reaches significance, then declares a winner. Its job is a clean, defensible answer to "is B better than A, and by how much?" The price is regret: half your traffic sees the worse page for the entire test. (Why tests often stall before that.)

What a bandit optimizes for

A bandit shifts traffic toward whatever is winning as data arrives. Its job is to maximize conversions during the experiment, not to produce one tidy verdict. It still learns, it just spends fewer visitors doing it. (How bandits work.)

Use an A/B test when

You need one rigorous number for a high-stakes, one-time decision, like a pricing change or a claim you will stand behind.
You have exactly two options and enough traffic to resolve them.
A clean readout matters more than conversions earned during the test.

Use a bandit when

You are optimizing continuously, with no real end date.
You have several variants and limited traffic per variant.
Wasting traffic on a known loser is expensive.
You want the system to keep adapting as segments and seasons change.

The honest middle

Many teams want both: the conversions a bandit captures and some confidence about why. Contextual bandits get close, learning the best variant per segment while still allocating by evidence. That is the model Optimeleon runs on.

The decision in one line

If you need to know one thing once, A/B test it. If you need to win continuously, use a bandit.