No A/B test is 100% reliable, but some are more reliable than others. A good A/B testing tool will tell you how *Statistically Significant* your results are, and how long you will have to run a test. But, to get the most out of your A/B tests, here are some quick points.

*NB. This is a brief version of what you need to know. For more details, you should take a look at this free ** Practical Guide to A/B testing Statistics*10

*.*

## How long should an A/B test last for reliable results?

There are two important factors for increasing the reliability of your tests:

- Statistical Significance
- Representativeness

Results are said to be "**Statistically Significant**" when they are very unlikely to have occurred due to chance and random variation. In other words, you are unlikely to have produced the different conversion rates for page A and page B unless something concrete has changed.

As we've mentioned, no test is 100% reliable. Even for Statistically Significant results, there is always a chance of producing a False Positive or a False Negative. However, making sure you achieve Statistical Significance will reduce the chance of producing a statistical error.

**"Representativeness"** varies according to the methods you use to test and the quality of your sample. If your traffic is seasonal, or changes a lot depending on the source, that will affect the Representativeness of your results.

### How Do You Achieve Statistical Significance?

The Statistical Significance of an A/B test depends on three things:

- Your
**Sample Size** - What
**Confidence Level**you decide on - The size of your
**Uplift**

*Your Confidence Level is the minimum probability you are willing to accept that your results are not due to chance. If you set your Confidence Level at 95% then 19 times out of 20 your results will reflect a genuine effect.*

Your A/B testing tool should tell you how reliable your results are. Alternatively, entering your test data (including page visits and conversion rates) into a significance calculator will show you if the results are Statistically Significant. You can find some great free calculators online (try, for example https://abtestguide.com/calc/9 or https://www.convertize.com/ab-test-significance/9).

To achieve Statistical Significance, your sample size must be large enough. For smaller effects, and lower base rates (your original Conversion Rate), the required sample is larger. So, for a website with a base conversion rate of 2%, these are the sample sizes you would require to achieve Statistical Significance after 30 days...

Most people underestimate the number of visitors required to provide a reliable sample. For Multivariate Testing, the required sample size increases exponentially.

### How Do You Make Sure Your Test Is Representative?

To make sure your rests are Representative, you need to avoid a number of classic A/B testing errors.

**Biasing your traffic**

You need a large enough sample size, but you also need your sample to be genuine. Redirecting visitors to a test page or using paid advertising to bring them there will make your test less representative.

**Not testing for long enough**

Aside from the need to collect a large enough sample, you need to account for weekly variations. Running an A/B test over the weekend will not work, because online behaviour changes a lot between Friday and Monday.

**Testing for too long**

A/B testing tools use cookies to sort visitors in two groups (page A and page B). That way, one person only sees one version of the page, so you know that an eventual conversion can be attributed to that design. However, studies show that up to 30% of internet users9 delete their cookies on a more-than-monthly basis, and the average rate of cookie deletion is about 4 times a month. So, running an experiment for months on end is likely to cause problems.

### How long should an A/B test last for reliable results?

There is no definitive or optimal duration for an A/B test, and no A/B test is 100% reliable. However, you can improve the quality of your results by sticking to these rule-of-thumb principles:

- Always run your experiments until they reach Statistical Significance
- Avoid biasing your traffic
- Do not run a test for less than two weeks or more than four weeks.