Experiment sample planning

Use the Sample size card on draft experiments to estimate how many starters you need per variant.

Overview

Before you launch an A/B test, the Sample size panel on a draft experiment helps you estimate how many people need to start the flow on each variant — including the control — so comparisons against control are likely to be informative.

The headline number is a planning estimate, not a promise that results will be decisive on your stop date.

Draft experiment Sample size card with headline number and four input fields

What the headline number means

The large number is how many distinct starters each variant should receive when traffic is split evenly.

Starters are people who begin the flow — not impressions or app opens unless those coincide with a start event.
With uneven traffic weights, calendar time to reach the target differs per arm even though the per-arm count goal is the same.

The four inputs

Control completion rate (baseline)

Question: What share of starters usually complete the flow on the control version today?

Shown as a percent (12% means about 12 completions per 100 starters on control).
When your control variant has a pinned published version, Rheo tries to fill this from the last 30 days of analytics for that pin in the experiment's environment (test or live).
If there is no history yet, enter a reasonable planning guess. You can override a prefilled value anytime.

Minimum lift to detect (MDE)

Question: What is the smallest improvement over control you want this plan to be able to spot?

Entered in percentage points, not relative percent. Example: baseline 10% and MDE 3 means you are planning around 10% vs 13% completion — not 10% vs 10.3%.
Smaller MDE → more starters required, because subtle differences are harder to separate from noise.

False-positive rate (family α)

Question: How much false-alarm risk is acceptable for the experiment as a whole?

0.05 (5%) is a common default for the entire family of comparisons when several treatments each face the same control.
Rheo spreads that risk across treatment-vs-control comparisons. More treatment arms → stricter per-comparison bar → larger planned sample, all else equal.

Detection strength (power)

Question: If a variant is truly better than control by at least the MDE, how often should this plan detect it?

80% is a common default.
Higher power (for example 90%) → more people needed, because you are asking the plan to catch real lifts more reliably.

Saving your changes

Values save when you leave a field (click or tab away). Invalid numbers show a short error and are not saved.

Multiple variants

With more than one treatment vs the same control, Rheo plans each treatment-vs-control pair using standard two-arm math, then applies a conservative split of your family α across those comparisons so overall false-positive risk stays near what you set.

That approach is safer than ignoring multiple comparisons but is not the same as specialized multi-arm procedures some statisticians prefer. The chance of detecting at least one true winner can differ from the per-comparison power in the inputs.

How to use the plan in practice

Set baseline, MDE, α, and power to match how consequential a wrong call would be.
Note the per-variant starter target before you start.
During the run, compare actual starters on the experiment dashboard to the plan.
At pending decision, if counts are short, extend rather than forcing a winner.

See Experiments for the full lifecycle.

Experiment sample planning

On this page