qbrixqbrix

Use Cases

Multi-armed bandits shine in scenarios where you need to make repeated decisions under uncertainty. Unlike A/B tests that split traffic evenly and wait for statistical significance, bandits continuously shift traffic toward better-performing variants while still exploring alternatives.

Below are the most common use cases, each with a working Python SDK example you can adapt.


Optimize which items, content, or search results to show each user. Instead of manually curating rankings or waiting weeks for an A/B test to converge, a bandit continuously learns which variants drive higher engagement.

Each arm represents a different ranking algorithm, content variant, or recommendation strategy. Rewards are binary signals like clicks, add-to-carts, or purchases. The bandit shifts traffic toward the variant that produces the highest reward rate while still exploring alternatives to catch changes over time.

Recommended policies: Beta Thompson Sampling for click/no-click rewards, LinUCB when you have user features for personalization.

import qbrix
 
# create a pool with three ranking strategies as arms
pool = qbrix.pool.create(
    name="search-ranking",
    arms=[
        {"name": "bm25-default", "metadata": {"algorithm": "bm25"}},
        {"name": "semantic-v2", "metadata": {"algorithm": "semantic"}},
        {"name": "hybrid-rerank", "metadata": {"algorithm": "hybrid"}},
    ],
)
 
# run a beta thompson sampling experiment — ideal for click/no-click
exp = qbrix.experiment.create(
    name="search-ranking-optimization",
    pool_id=pool.id,
    policy="BetaTSPolicy",
)
 
# on each search request, select the best ranking strategy
result = qbrix.agent.select(
    experiment_id=exp.id,
    context={"id": "user-42", "metadata": {"query_type": "product"}},
)
 
ranking_algorithm = result.arm.metadata["algorithm"]
# ... apply the selected algorithm to rank search results ...
 
# reward = 1.0 if the user clicked a result, 0.0 otherwise
qbrix.agent.feedback(request_id=result.request_id, reward=1.0)

Dynamic Pricing

Find optimal price points per customer segment without running long, fixed-split experiments. Bandits adapt in real time — if a price point underperforms, traffic shifts away automatically.

Each arm is a price tier or discount level. Rewards can be revenue per session, conversion rate, or any continuous metric. Use contextual bandits with customer features (device, geography, tier) to learn per-segment pricing strategies rather than a single global optimum.

Recommended policies: Gaussian Thompson Sampling for continuous revenue rewards, LinTS when conditioning on customer features.

import qbrix
 
# each arm is a discount tier
pool = qbrix.pool.create(
    name="checkout-discount",
    arms=[
        {"name": "no-discount", "metadata": {"discount_pct": 0}},
        {"name": "5-percent", "metadata": {"discount_pct": 5}},
        {"name": "10-percent", "metadata": {"discount_pct": 10}},
        {"name": "15-percent", "metadata": {"discount_pct": 15}},
    ],
)
 
# gaussian TS for continuous revenue rewards
exp = qbrix.experiment.create(
    name="pricing-optimization",
    pool_id=pool.id,
    policy="GaussianTSPolicy",
)
 
# select the best discount for this customer segment
result = qbrix.agent.select(
    experiment_id=exp.id,
    context={
        "id": "user-7891",
        "metadata": {"country": "DE", "device": "mobile", "tier": "premium"},
    },
)
 
discount = result.arm.metadata["discount_pct"]
# ... apply discount to checkout ...
 
# reward is the revenue from this session (continuous value)
qbrix.agent.feedback(request_id=result.request_id, reward=49.90)

Content Personalization

Personalize headlines, CTAs, hero images, or page layouts without manual segmentation rules. The bandit learns which variant works best — globally or per-context if you provide user features.

Each arm is a content variant. Rewards are engagement signals: clicks, scroll depth, time on page, or conversions. Adding a context vector with user features (device type, referral source, past behavior) enables per-segment personalization without writing targeting rules by hand.

Recommended policies: Epsilon-Greedy for simple exploration, LinUCB or LinTS for contextual personalization.

import qbrix
 
# three headline variants for the landing page
pool = qbrix.pool.create(
    name="landing-headline",
    arms=[
        {"name": "speed", "metadata": {"headline": "Decisions at the speed of light"}},
        {"name": "scale", "metadata": {"headline": "Optimize millions of decisions per second"}},
        {"name": "simplicity", "metadata": {"headline": "Drop-in bandits for your stack"}},
    ],
)
 
# linUCB for contextual personalization — learns per user segment
exp = qbrix.experiment.create(
    name="headline-personalization",
    pool_id=pool.id,
    policy="LinUCBPolicy",
)
 
# provide user context so the bandit can learn segment preferences
result = qbrix.agent.select(
    experiment_id=exp.id,
    context={
        "id": "visitor-abc",
        "vector": [0.8, 0.2, 0.5],  # e.g. [tech_score, price_sensitivity, engagement]
        "metadata": {"source": "google", "device": "desktop"},
    },
)
 
headline = result.arm.metadata["headline"]
# ... render the selected headline ...
 
# reward = 1.0 if the user clicked the CTA
qbrix.agent.feedback(request_id=result.request_id, reward=1.0)

Feature Rollouts

Gradually roll out features with automatic optimization. Instead of a binary feature flag (on/off), use a bandit to find the best variant while controlling exposure. If a variant degrades metrics, the bandit naturally reduces its traffic share.

Each arm is a feature variant, including "control" as one arm. Rewards are the success metric you care about — load time, error rate, conversion, engagement. Combine with feature gates for targeting rules that restrict who enters the experiment.

Recommended policies: UCB1-Tuned for stable exploration-exploitation balance, MOSS when you have a fixed traffic budget.

import qbrix
 
# roll out a new checkout flow — control vs two new variants
pool = qbrix.pool.create(
    name="checkout-flow",
    arms=[
        {"name": "control", "metadata": {"version": "v1"}},
        {"name": "streamlined", "metadata": {"version": "v2-streamlined"}},
        {"name": "one-click", "metadata": {"version": "v2-oneclick"}},
    ],
)
 
# UCB1-Tuned for stable rollout with tight confidence bounds
exp = qbrix.experiment.create(
    name="checkout-rollout",
    pool_id=pool.id,
    policy="UCB1TunedPolicy",
)
 
# optionally gate the experiment to 20% of traffic first
qbrix.gate.create(
    name="checkout-rollout-gate",
    experiment_id=exp.id,
    rollout_percentage=20,
)
 
# select which checkout variant to show
result = qbrix.agent.select(
    experiment_id=exp.id,
    context={"id": "user-456"},
)
 
checkout_version = result.arm.metadata["version"]
# ... render the selected checkout flow ...
 
# reward = 1.0 if the user completed the purchase
qbrix.agent.feedback(request_id=result.request_id, reward=1.0)