Sample Size: Why 30 Trades Aren't Enough Evidence (And What Is)

A trader takes 10 trades; 6 win. Conclusion: 60% win rate strategy. They size up.

But 10 trades is barely above noise. With a true 50% strategy, getting 6 winners in a sample of 10 happens about 38% of the time by pure chance. The "60% win rate" was illusion. The trader who sizes up based on that conclusion is making decisions on noise.

Understanding sample size, what it actually buys you and what it doesn't, is one of the most underrated disciplines in trading. Most retail conclusions are drawn from samples too small to be meaningful, in both directions.

Why sample size matters

Outcomes from probabilistic processes have variance. Across many trials, the average converges to the true probability. Across few trials, the observed rate can deviate substantially from the true rate.

The technical version: standard error of a measured rate shrinks proportional to 1/√n, where n is sample size. Doubling n only reduces noise by ~30%. To halve the noise, you need 4x the sample size.

Implications:

10-trade samples are dominated by noise
30-trade samples are noisy but starting to inform
100-trade samples give reasonable estimates with meaningful confidence intervals
300+ trade samples give relatively reliable estimates
1000+ trade samples are statistically strong

Most retail conclusions are drawn from samples in the 10-30 range, where noise is the dominant signal. The conclusions are therefore mostly wrong.

What sample size buys you, concretely

Here's roughly what confidence intervals look like for measured win rates at different sample sizes:

Sample size	True rate ± 95% CI
10 trades	±31%
30 trades	±18%
100 trades	±10%
300 trades	±5.7%
1000 trades	±3.1%

So a "60% win rate over 30 trades" is actually somewhere between 42% and 78% with 95% confidence. That's a huge range, the strategy could easily be 50% true rate, with the observed 60% being noise.

To say "I'm confident this is a 55%+ true win rate strategy" requires roughly 100+ trades. To say "I'm confident this is a 60%+ true win rate strategy" with the same statistical strength requires several hundred.

Most retail traders are far below these thresholds when they make conclusions about their strategies.

The implications

Several specific implications for trading:

1. Don't size up on short streaks. A 5-trade winning streak doesn't prove the strategy got better. Random variation produces 5-trade streaks regularly even in flat-EV strategies. Sizing up on streaks captures noise, not signal.

2. Don't abandon strategies on short losing streaks. A 5-trade losing streak doesn't prove the strategy stopped working. Random variation produces these too. Abandoning strategies on noise is one of the main causes of strategy hopping.

3. Backtests need substantial samples. A backtest with 50 trades is barely informative. 100+ is starting to be useful. 500+ across multiple regimes is what gives reasonable confidence in the historical edge.

4. Live performance evaluation needs time. 30 live trades is enough to detect catastrophic strategy failure (e.g., the strategy has clearly stopped working) but not enough to confirm it's working. 100+ trades is where you can have moderate confidence.

5. Sub-categories need their own sample sizes. "My breakout trades" might have 100 trades. "My breakout trades on alts during bull markets" might have 12 trades. The sub-category conclusion is on the smaller sample; treat it accordingly.

A common mistake: cherry-picking from small samples

A trader looks at 30 trades; 18 won. They cherry-pick the 18 winning trades to find what they had in common. "All winners had volume above 2x average!" They rebuild the strategy around that filter.

The problem: with 30 trades, some characteristic will be more present in winners just by chance. The volume filter is curve-fit to noise. Out-of-sample, it doesn't help.

The fix: any "discovered pattern" in a small sample needs out-of-sample testing. The pattern might be real; it might be noise. The only way to tell is to test on data you haven't seen.

A common mistake: comparing strategies with mismatched samples

"Strategy A has 65% win rate; strategy B has 55%. Strategy A is better."

But strategy A has 20 trades; strategy B has 200. The 65% on 20 trades has confidence interval ±21% (so true rate could be 44-86%). The 55% on 200 trades has CI ±7% (true rate 48-62%). It's actually more likely that strategy B has the higher true rate, given the data.

The fix: when comparing strategies, the one with the larger sample has a more reliable estimate. Don't compare estimates without acknowledging the sample sizes behind them.

A common mistake: drawing regime conclusions from small samples

"Last 10 trades in this regime have been losers." OK, but 10 trades in any regime is insufficient. The losing streak might be the regime change everyone's worried about; it might be noise.

The fix: regime-shift conclusions require larger samples. A 30+ trade window of consistent underperformance is more informative than a 10-trade losing streak. Don't make regime calls from short windows.

Bayesian thinking and sample size

The Bayesian frame: you start with a prior (the historical base rate). Each new trade is evidence that updates your belief. Strong priors require strong evidence to move much; weak priors update faster.

In practice:

If you have a strong prior (200-trade backtest with +0.3R expectancy), short live underperformance shouldn't shift your belief much
If you have a weak prior (just started a new strategy with no history), the first 50 trades have a larger effect on your confidence

This means: the more historical data you have on a strategy, the less you should react to short-term performance changes. New strategies need closer monitoring because you have less prior to anchor against.

A common mistake: confusing noise for signal

A trader's strategy has had 4 winners in a row. They think the strategy "really clicked." They size up. The next trade, a 50% probability outcome, loses. The trader's larger size produces a larger loss. Now they're back to break-even on the 5 trades, but with inflated emotional response.

The fix: streaks are noise unless they're long enough to exceed plausible random variation. With a 50% strategy, 4-streaks happen ~6% of the time per 4-trade window. Routine. Don't change behavior because of routine variation.

A common mistake: confusing signal for noise

The mirror of the above: an actual edge exists, but the trader can't see it through the noise of small samples. After 50 trades with +0.1R per trade, they conclude "no edge." But +0.1R/trade with 50 trades has noise that makes it indistinguishable from 0R/trade, even though the true rate might be +0.1R.

The fix: small edges require large samples to detect. If your strategy's expected edge is small (~+0.1R), you need hundreds of trades to confirm it. Don't abandon small-edge strategies after small samples.

How to think about your own data

The practical version:

Less than 30 trades: Whatever you observe is mostly noise. Don't draw strong conclusions. Use this period for checking that the process is being followed.
30-100 trades: Tentative conclusions possible. Strong direction in the data (e.g., -0.5R per trade) is signal; smaller deviations from expectation are likely noise.
100-300 trades: Reasonable confidence in observed performance. Evaluate strategy at this sample size.
300+ trades: Strong confidence. Performance numbers reliable.

When you don't have enough data, the honest answer is "I don't know yet." Most retail prefers premature confidence over honest uncertainty. The premature confidence costs real money.

Mental model, sample size as the resolution of your microscope

A microscope at low resolution shows you a blurry image. You can see general shapes but can't make out details. Conclusions drawn from low-resolution images are approximate at best, often wrong.

A microscope at high resolution shows you sharp details. Conclusions are reliable because you can see what's actually there.

Sample size is your statistical resolution. Small samples show you a blurry version of the truth, you can see direction but not detail. Large samples show you the truth sharply. Drawing detailed conclusions from blurry images is the cognitive error that small samples produce.

Why this matters for trading

Sample size discipline is what protects you from the overconfidence that small samples generate. Hex37's journal accumulates the trade history that becomes your sample over months and years; the discipline of waiting for sufficient sample sizes before drawing strong conclusions is what makes the data actionable rather than misleading.

Takeaway

Sample size determines how much your observations can trust. 10-30 trade samples are mostly noise; 100+ is where reasonable conclusions emerge; 300+ gives strong confidence. Standard error shrinks as 1/√n, doubling sample only cuts noise by ~30%. Don't size up on short streaks; don't abandon strategies on short losing streaks. Compare strategies with awareness of sample size differences. New strategies need closer monitoring than well-validated ones. The honest answer to "is my strategy working?" with 25 trades is usually "I don't know yet."