Quickstart
Get from zero to a working bandit experiment in under 5 minutes.
Prerequisites
- A qbrix account — sign up at cloud.qbrix.io
curl, the Python SDK, or any HTTP client
Install the Python SDK
pip install qbrixThe SDK requires Python 3.10+. It wraps all qbrix HTTP endpoints into a typed client — pools, experiments, selection, and feedback.
Create an API Key
- Log in to cloud.qbrix.io
- Go to Settings → API Keys
- Click Create API Key and copy the key
Set your environment variables:
import qbrix
client = qbrix.Qbrix(
api_key="your-api-key",
base_url="https://cloud.qbrix.io",
)export QBRIX_URL="https://cloud.qbrix.io"
export QBRIX_API_KEY="your-api-key"Create a Pool
A pool is a collection of arms (variants). Let's create one with 3 homepage hero images:
pool = client.pool.create(
name="homepage-heroes",
arms=[
{"name": "minimal-dark", "metadata": {"image": "hero-1.png"}},
{"name": "gradient-bold", "metadata": {"image": "hero-2.png"}},
{"name": "illustration", "metadata": {"image": "hero-3.png"}},
],
)
print(pool.id)curl -s -X POST $QBRIX_URL/api/v1/pools \
-H "X-API-Key: $QBRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "homepage-heroes",
"arms": [
{"name": "minimal-dark", "metadata": {"image": "hero-1.png"}},
{"name": "gradient-bold", "metadata": {"image": "hero-2.png"}},
{"name": "illustration", "metadata": {"image": "hero-3.png"}}
]
}' | jq .Create an Experiment
Link the pool to a bandit policy. We'll use Beta Thompson Sampling — a good default for binary rewards (click / no click):
experiment = client.experiment.create(
name="hero-optimization",
pool_id=pool.id,
policy="BetaTSPolicy",
)
print(experiment.id)curl -s -X POST $QBRIX_URL/api/v1/experiments \
-H "X-API-Key: $QBRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "hero-optimization",
"pool_id": "'"$POOL_ID"'",
"policy": "BetaTSPolicy",
"policy_params": {}
}' | jq .Select an Arm
Ask qbrix to select the best arm for a request:
result = client.agent.select(
experiment_id=experiment.id,
context={"id": "user-001"},
)
print(f"Selected: {result.arm.name}")
print(f"Request ID: {result.request_id}")curl -s -X POST $QBRIX_URL/api/v1/agent/select \
-H "X-API-Key: $QBRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"experiment_id": "'"$EXP_ID"'"
}' | jq .The response includes the selected arm and a request_id to link feedback:
{
"request_id": "req-abc123",
"arm": {"id": "...", "name": "gradient-bold", "index": 1},
"is_default": false
}Send Feedback
After observing the outcome (e.g., user clicked), send a reward signal:
client.agent.feedback(
request_id=result.request_id,
reward=1.0,
)curl -s -X POST $QBRIX_URL/api/v1/agent/feedback \
-H "X-API-Key: $QBRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"request_id": "req-abc123",
"reward": 1.0
}'Feedback is processed asynchronously. Policy parameters update within seconds depending on batch settings.
Verify the Loop
Run a few more select → feedback cycles. You'll see the policy begin to favor higher-reward arms:
import random
for i in range(20):
# select
result = client.agent.select(
experiment_id=experiment.id,
context={"id": f"user-{i}"},
)
# simulate reward: arm index 1 has 70% click rate, others 30%
if result.arm.index == 1:
reward = 1.0 if random.random() < 0.7 else 0.0
else:
reward = 1.0 if random.random() < 0.3 else 0.0
# feedback
client.agent.feedback(
request_id=result.request_id,
reward=reward,
)
print(f"round {i+1}: arm={result.arm.name} reward={reward}")for i in $(seq 1 20); do
# select
RESP=$(curl -s -X POST $QBRIX_URL/api/v1/agent/select \
-H "X-API-Key: $QBRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"experiment_id": "'"$EXP_ID"'"}')
ARM=$(echo $RESP | jq -r .arm.index)
REQ_ID=$(echo $RESP | jq -r .request_id)
# simulate reward: arm 1 has 70% click rate, others 30%
REWARD=$( [ "$ARM" -eq 1 ] && echo "1.0" || echo "0.0" )
[ $(( RANDOM % 10 )) -lt 3 ] && REWARD=$( [ "$REWARD" = "1.0" ] && echo "0.0" || echo "1.0" )
# feedback
curl -s -X POST $QBRIX_URL/api/v1/agent/feedback \
-H "X-API-Key: $QBRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{"request_id": "'"$REQ_ID"'", "reward": '"$REWARD"'}'
echo "round $i: arm=$ARM reward=$REWARD"
doneView Results
Open your experiment in the qbrix console to see selection distributions and reward metrics.
What's Next
- Pools & Experiments — understand the data model
- Feedback & Rewards — how the learning loop works
- API Reference — explore all HTTP endpoints
- Policies — choose the right algorithm for your use case