Skip to main content

Multi Armed Bandit

Model

The base Autotune implementation uses a Thompson Sampling (Bayesian) algorithm to estimate each variant's probability of being the best variant and allocate a proportional amount of traffic.

For example, if a given variant has a 60% probability of being the best, Autotune will provide it 60% of the traffic. At a high level, the multi-armed bandit algorithm works by adding more users to a treatment as soon as it recognizes that it is clearly better in maximizing the reward (the target metric).

Throughout the process, higher performing treatments are allocated more traffic whereas underperforming treatments are allocated less traffic. When the winning treatment beats the second best treatment by a specified margin, the process terminates.

Some helpful references:

Advantages

The major advantage of the base Multi-Armed Bandit over a contextual bandit is its ability to converge and identify a best variant. In cases where there is a "one-size-fits-all" solution, this is an elegant way to efficiently allocate traffic and determine a correct long-term solution while minimizing "regret" - the outcome of a high number of users being exposed to the worse variant like in an a/b test.

Disadvantages

The major disadvantages of a Multi-Armed Bandit as compared to a contextual bandit is its inability to personalize; in cases where there's interactions between a user's attributes and variants, vanilla Autotune can identify a global maxima that is worse than serving users their individual best variant.

You can see this in a toy example; even if the "US Flag" variant had the highest overall CTR, it would be a bad choice for CA users. In this example, both groups will converge to a sub-optimal variant.

A/B/n TestMulti-Armed Bandit (Autotune)Contextual Bandit (Autotune AI)Ranking Engine
Typical # Variants2-34-84-8Arbitrary #
Personalization FactorNoneNoneModerateHigh
Input Data RequiredNoneVery Little (100+ samples)Little - generally 1000+ samplesTens of thousands to millions of samples
Model EfficacyNoneBasicModerateHigh
Identifies Best VariantYesYesNoNo
Consistent User AssignmentYesNoNoNo