Use Cases
Multi-armed bandits shine in scenarios where you need to make repeated decisions under uncertainty. Unlike A/B tests that split traffic evenly and wait for statistical significance, bandits continuously shift traffic toward better-performing variants while still exploring alternatives.
Below are the most common use cases, each with a working Python SDK example you can adapt.
Recommendations & Search
Optimize which items, content, or search results to show each user. Instead of manually curating rankings or waiting weeks for an A/B test to converge, a bandit continuously learns which variants drive higher engagement.
Each arm represents a different ranking algorithm, content variant, or recommendation strategy. Rewards are binary signals like clicks, add-to-carts, or purchases. The bandit shifts traffic toward the variant that produces the highest reward rate while still exploring alternatives to catch changes over time.
Recommended policies: Beta Thompson Sampling for click/no-click rewards, LinUCB when you have user features for personalization.
import qbrix
# create a pool with three ranking strategies as arms
pool = qbrix.pool.create(
name="search-ranking",
arms=[
{"name": "bm25-default", "metadata": {"algorithm": "bm25"}},
{"name": "semantic-v2", "metadata": {"algorithm": "semantic"}},
{"name": "hybrid-rerank", "metadata": {"algorithm": "hybrid"}},
],
)
# run a beta thompson sampling experiment — ideal for click/no-click
exp = qbrix.experiment.create(
name="search-ranking-optimization",
pool_id=pool.id,
policy="BetaTSPolicy",
)
# on each search request, select the best ranking strategy
result = qbrix.agent.select(
experiment_id=exp.id,
context={"id": "user-42", "metadata": {"query_type": "product"}},
)
ranking_algorithm = result.arm.metadata["algorithm"]
# ... apply the selected algorithm to rank search results ...
# reward = 1.0 if the user clicked a result, 0.0 otherwise
qbrix.agent.feedback(request_id=result.request_id, reward=1.0)Dynamic Pricing
Find optimal price points per customer segment without running long, fixed-split experiments. Bandits adapt in real time — if a price point underperforms, traffic shifts away automatically.
Each arm is a price tier or discount level. Rewards can be revenue per session, conversion rate, or any continuous metric. Use contextual bandits with customer features (device, geography, tier) to learn per-segment pricing strategies rather than a single global optimum.
Recommended policies: Gaussian Thompson Sampling for continuous revenue rewards, LinTS when conditioning on customer features.
import qbrix
# each arm is a discount tier
pool = qbrix.pool.create(
name="checkout-discount",
arms=[
{"name": "no-discount", "metadata": {"discount_pct": 0}},
{"name": "5-percent", "metadata": {"discount_pct": 5}},
{"name": "10-percent", "metadata": {"discount_pct": 10}},
{"name": "15-percent", "metadata": {"discount_pct": 15}},
],
)
# gaussian TS for continuous revenue rewards
exp = qbrix.experiment.create(
name="pricing-optimization",
pool_id=pool.id,
policy="GaussianTSPolicy",
)
# select the best discount for this customer segment
result = qbrix.agent.select(
experiment_id=exp.id,
context={
"id": "user-7891",
"metadata": {"country": "DE", "device": "mobile", "tier": "premium"},
},
)
discount = result.arm.metadata["discount_pct"]
# ... apply discount to checkout ...
# reward is the revenue from this session (continuous value)
qbrix.agent.feedback(request_id=result.request_id, reward=49.90)Content Personalization
Personalize headlines, CTAs, hero images, or page layouts without manual segmentation rules. The bandit learns which variant works best — globally or per-context if you provide user features.
Each arm is a content variant. Rewards are engagement signals: clicks, scroll depth, time on page, or conversions. Adding a context vector with user features (device type, referral source, past behavior) enables per-segment personalization without writing targeting rules by hand.
Recommended policies: Epsilon-Greedy for simple exploration, LinUCB or LinTS for contextual personalization.
import qbrix
# three headline variants for the landing page
pool = qbrix.pool.create(
name="landing-headline",
arms=[
{"name": "speed", "metadata": {"headline": "Decisions at the speed of light"}},
{"name": "scale", "metadata": {"headline": "Optimize millions of decisions per second"}},
{"name": "simplicity", "metadata": {"headline": "Drop-in bandits for your stack"}},
],
)
# linUCB for contextual personalization — learns per user segment
exp = qbrix.experiment.create(
name="headline-personalization",
pool_id=pool.id,
policy="LinUCBPolicy",
)
# provide user context so the bandit can learn segment preferences
result = qbrix.agent.select(
experiment_id=exp.id,
context={
"id": "visitor-abc",
"vector": [0.8, 0.2, 0.5], # e.g. [tech_score, price_sensitivity, engagement]
"metadata": {"source": "google", "device": "desktop"},
},
)
headline = result.arm.metadata["headline"]
# ... render the selected headline ...
# reward = 1.0 if the user clicked the CTA
qbrix.agent.feedback(request_id=result.request_id, reward=1.0)Feature Rollouts
Gradually roll out features with automatic optimization. Instead of a binary feature flag (on/off), use a bandit to find the best variant while controlling exposure. If a variant degrades metrics, the bandit naturally reduces its traffic share.
Each arm is a feature variant, including "control" as one arm. Rewards are the success metric you care about — load time, error rate, conversion, engagement. Combine with feature gates for targeting rules that restrict who enters the experiment.
Recommended policies: UCB1-Tuned for stable exploration-exploitation balance, MOSS when you have a fixed traffic budget.
import qbrix
# roll out a new checkout flow — control vs two new variants
pool = qbrix.pool.create(
name="checkout-flow",
arms=[
{"name": "control", "metadata": {"version": "v1"}},
{"name": "streamlined", "metadata": {"version": "v2-streamlined"}},
{"name": "one-click", "metadata": {"version": "v2-oneclick"}},
],
)
# UCB1-Tuned for stable rollout with tight confidence bounds
exp = qbrix.experiment.create(
name="checkout-rollout",
pool_id=pool.id,
policy="UCB1TunedPolicy",
)
# optionally gate the experiment to 20% of traffic first
qbrix.gate.create(
name="checkout-rollout-gate",
experiment_id=exp.id,
rollout_percentage=20,
)
# select which checkout variant to show
result = qbrix.agent.select(
experiment_id=exp.id,
context={"id": "user-456"},
)
checkout_version = result.arm.metadata["version"]
# ... render the selected checkout flow ...
# reward = 1.0 if the user completed the purchase
qbrix.agent.feedback(request_id=result.request_id, reward=1.0)