Tests and Statistical Significance

The pain point of every test: did it get statistically significant? As data-driven marketers, we are not only supposed to collect and analyze data, but most importantly we must demonstrate the validity of the data we collect.

Everybody who is involved in conversion rates optimization might have experienced the following: you start a test, after 5 days you have 1000 sessions, a few conversions, the variant conversion rate is 10% higher than control, the probability that the variant is better than the control is 40% and they ask you to stop the test. Ok guys, those requests really piss me off. They are the indicators that whoever is proposing it has no idea about one fundamental pillar of every test methodology: statistical significance.

When you conduct an experiment, statistical significance enables you to exclude that the results of your test occurred randomly instead of being attributable to a specific cause. When you analyze a data set, strong statistical significance makes you believing that the results are not connected to chance or luck.

Not significant means that you can't say that there is an improvement.

Not significant means that differences might have happened due to randomness.

Not significant means that you have to say that there is no real different between the control and the variant.

Looking at the statistical significance means that you want to reduce the risk of stating that there is an improvement while actually there is none. 95% statistical significance means that you have 5% of probability to go for a false improvement. When you start an experiment, I recommend you to accept results only if the statistical significance is at least 90%. Please don't go for lower significance otherwise you risk to draw wrong conclusions from your tests.

Sample size influences massively the significance, for instance larger samples are less prone to fluctuate. This means that the number of visitors you get on the page in which you want to start the experiment will impact the statistical significance. If the page has very low traffic, most likely your test will never get significance. Before starting a test, check upfront how many visitors you have on your page. If it is too low, just give up to run the test on that page and go instead for a page with higher traffic. Note that the smaller the sample, the longer you have to wait to reach significance.

Now that we talked about what statistical significance is about, how to calculate it? As marketer, you probably won't need to calculate it by yourself (luckily!). Calculating statistical significance is extremely complex, most of us is using calculators rather than solving its equation. In the next weeks, I am planning to go deeper with some fundamentals of statistics, this will be the occasion to continue to talk about statistical significance in a pure mathematical way. In the meantime, I highly invite you to use this calculator, it is one of the best I've found and used so far.

I hope that after reading this article you will question more the data you are collecting and will be the ambassador of statistical significance!

Tests and Statistical Significance

Recent Posts

Comments