The world today is drowning in data.
That may sound like hyperbole but consider this. In 2018, humans around the world produced more than 2.5 quintillion bytes of data—each day. According to some estimates, every minute people conduct almost 4.5 million Google searches, post 511,200 tweets, watch 4.5 million YouTube videos, swipe 1.4 million times on Tinder, and order 8,683 meals from GrubHub. These numbers—and the world’s total data—are expected to continue growing exponentially in the coming years.
For behavioral researchers and businesses, this data represents a valuable opportunity. However, using data to learn about human behavior or make decisions about consumer behavior often requires an understanding of statistics and statistical significance.
Statistical significance is a measurement of how likely it is that the difference between two groups, models, or statistics occurred by chance or occurred because two variables are actually related to each other. This means that a “statistically significant” finding is one in which it is likely the finding is real, reliable, and not due to chance.
To evaluate whether a finding is statistically significant, researchers engage in a process known as null hypothesis significance testing. Null hypothesis significance testing is less of a mathematical formula and more of a logical process for thinking about the strength and legitimacy of a finding.
Imagine a Vice President of Marketing asks her team to test a new layout for the company website. The new layout streamlines the user experience by making it easier for people to place orders and suggesting additional items to go along with each customer’s purchase. After testing the new website, the VP finds that visitors to the site spend an average of $12.63. Under the old layout, visitors spent an average of $12.32, meaning the new layout increases average spending by $0.31 per person. The question the VP must answer is whether the difference of $0.31 per person is significant or something that likely occurred by chance.
To answer this question with statistical analysis, the VP begins by adopting a skeptical stance toward her data known as the null hypothesis. The null hypothesis assumes that whatever researchers are studying does not actually exist in the population of interest. So, in this case the VP assumes that the change in website layout does not influence how much people spend on purchases.
With the null hypothesis in mind, the manager asks how likely it is that she would obtain the results observed in her study—the average difference of $0.31 per visitor—if the change in website layout actually causes no difference in people’s spending (i.e., if the null hypothesis is true). If the probability of obtaining the observed results is low, the manager will reject the null hypothesis and conclude that her finding is statistically significant.
Statistically significant findings indicate not only that the researchers’ results are unlikely the result of chance, but also that there is an effect or relationship between the variables being studied in the larger population. However, because researchers want to ensure they do not falsely conclude there is a meaningful difference between groups when in fact the difference is due to chance, they often set stringent criteria for their statistical tests. This criterion is known as the significance level.
Within the social sciences, researchers often adopt a significance level of 5%. This means researchers are only willing to conclude that the results of their study are statistically significant if the probability of obtaining those results if the null hypothesis were true—known as the p value—is less than 5%.
Five percent represents a stringent criterion, but there is nothing magical about it. In medical research, significance levels are often set at 1%. In cognitive neuroscience, researchers often adopt significance levels well below 1%. And, when astronomers seek to explain aspects of the universe or physicists study new particles like the Higgs Boson they set significance levels several orders of magnitude below .05.
In other research contexts like business or industry, researchers may set more lenient significance levels depending on the aim of their research. However, in all research, the more stringently a researcher sets their significance level, the more confident they can be that their results are not due to chance.
Determining whether a given set of results is statistically significant is only one half of the hypothesis testing equation. The other half is ensuring that the statistical tests a researcher conducts are powerful enough to detect an effect if one really exists. That is, when a researcher concludes their hypothesis was incorrect and there is no effect between the variables being studied, that conclusion is only meaningful if the study was powerful enough to detect an effect if one really existed.
The power of a hypothesis test is influenced by several factors.
Sample size—or, the number of participants the researcher collects data from—affects the power of a hypothesis test. Larger samples with more observations generally lead to higher-powered tests than smaller samples. In addition, large samples are more likely to produce replicable results because extreme scores that occur by chance are more likely to balance out in a large sample rather than in a small one.
Although setting a low significance level helps researchers ensure their results are not due to chance, it also lowers their power to detect an effect because it makes rejecting the null hypothesis harder. In this respect, the significance level a researcher selects is often in competition with power.
Standard deviations represent unexplained variability within data, also known as error. Generally speaking, the more unexplained variability within a dataset, the less power researchers have to detect an effect. Unexplained variability can be the result of measurement error, individual differences among participants, or situational noise.
A final factor that influences power is the size of the effect a researcher is studying. As you might expect, big changes in behavior are easier to detect than small ones.
Sometimes researchers do not know the strength of an effect before conducting a study. Even though this makes it harder to conduct a well powered study, it is important to keep in mind that phenomena that produce a large effect will lead to studies with more power than phenomena that produce only a small effect.
Statistical significance is important because it allows researchers to hold a degree of confidence that their findings are real, reliable, and not due to chance. But statistical significance is not equally important to all researchers in all situations. The importance of obtaining statistically significant results depends on what a researcher studies and within what context.
Within academic research, statistical significance is often critical because academic researchers study theoretical relationships between different variables and behavior. Furthermore, the goal of academic research is often to publish research reports in scientific journals. The threshold for publishing in academic journals is often a series of statistically significant results.
Outside of academia, statistical significance is often less important. Researchers, managers, and decision makers in business may use statistical significance to understand how strongly the results of a study should inform the decisions they make. But, because statistical significance is simply a way of quantifying how much confidence to hold in a research finding, people in industry are often more interested in a finding’s practical significance than statistical significance.
To demonstrate the difference between practical and statistical significance, imagine you’re a candidate for political office. Maybe you have decided to run for local or state-wide office, or, if you’re feeling bold, imagine you’re running for President.
During your campaign, your team comes to you with data on messages intended to mobilize voters. These messages have been market tested and now you and your team must decide which ones to adopt.
If you go with Message A, 41% of registered voters say they are likely to turn out at the polls and cast a ballot. If you go with Message B, this number drops to 37%. As a candidate, should you care whether this difference is statistically significant at a p value below .05?
The answer is of course not. What you likely care about more than statistical significance is practical significance—the likelihood that the difference between groups is large enough to be meaningful in real life.
You should ensure there is some rigor behind the difference in messages before you spend money on a marketing campaign, but when elections are sometimes decided by as little as one vote you should adopt the message that brings more people out to vote. Within business and industry, the practical significance of a research finding is often equally if not more important than the statistical significance. In addition, when findings have large practical significance, they are almost always statistically significant too.
Conducting statistically significant research is a challenge, but it’s a challenge worth tackling. Flawed data and faulty analyses only lead to poor decisions. Start taking steps to ensure your surveys and experiments produce valid results by using CloudResearch. If you have the team to conduct your own studies, CloudResearch can help you find large samples of online participants quickly and easily. Regardless of your demographic criteria or sample size, we can help you get the participants you need. If your team doesn’t have the resources to run a study, we can run it for you. Our team of expert social scientists, computer scientists, and software engineers can design any study, collect the data, and analyze the results for you. Let us show you how conducting statistically significant research can improve your decision-making today.