By Leib Litman, PhD, Shalom Jaffe, Cheskie Rosenzweig, MS, Aaron Moss, PhD & Jonathan Robinson, PhD
Similar to 2016, the 2020 Presidential election was much closer than polls forecasted, particularly in battleground states. The disconnect between how the media covered Presidential polls in both 2016 and 2020 and the actual outcome of those elections left many people feeling misled. As a result, some people have understandably begun to question the accuracy and even the legitimacy of polls. Here, we aim to demystify some of the common problems polls face by presenting five of the most common misconceptions people have about polls and mistakes pollsters make.
In the ‘good old days’, polling was simple. A pollster would start with a list of phone numbers of everyone in the United States. Then, they would randomly select several thousand numbers and conduct the poll. This is what’s referred to as probability sampling because every person in the population has an equal probability of being selected for the poll.
For several decades, probability sampling worked well. But then, people increasingly became less willing to answer the phone. Furthermore, some groups of people were less likely to answer the phone than others (a problem referred to as non-response bias). People in minority groups, people with less education and lower socio-economic status, and younger people are all harder to reach in phone polls than other groups.
Different response rates across groups presents a serious issue for polls because it violates the most important requirement of a probability poll. When people participate in polls at different rates, it is no longer the case that every person has an equal chance of being in the poll. People who are White, college-educated, and engaged in politics are overrepresented in polls today.
The takeaway from this problem is simple—no poll produces a representative sample. Instead of pollsters becoming discouraged, they use methodological and statistical techniques to correct for non-response bias. However, those techniques are not perfect and can lead to other types of bias. Most of the points below stem from the bias created by the application of those techniques to the probability sampling process.
In theory, correcting for non-response bias is simple. Let’s say people with a college degree are twice as likely to answer a poll compared to those without a college degree. Pollsters correct for this problem by counting the people without a college degree twice, a process called weighting. All polls weight samples by multiple variables so that, in the end, the sample approximates the US population demographically. But as Yogi Berra once said: “in theory, practice should look like theory. But in practice it does not.” At least not always. And this is where polling runs into multiple problems.
Weighting for non-response only works when non-response is within reasonable limits. Today, it is not uncommon for polls to encounter astronomical levels of non-response. For example, as reported by David Hill in the Washington Post, in Florida pre-election polls he conducted in late October of 2020, only 4/10th of 1% of people contacted actually answered the phone. That is a non-response rate of 99.6%!
When non-response rates are this high, it’s likely that the 0.4% of people who participate in the poll are systematically different from the 99.6% who do not participate. There is simply no way to make statistical corrections with any level of confidence when the differences between those who participate and those who don’t exist at this scale.
The take-home message here is also simple. When reading a poll, keep an eye out for non-response rates. Non-response rates are often reported in final sample dispositions (keep in mind that they may not be reported in the publicly available method sections of polls). If non-response rates are over 99%, proceed with extreme caution!
To address some of the limitations of phone polling, pollsters have shifted to recruiting respondents online. According to a recent report by the Pew Research Center, 80% of polls recruit at least some respondents over the internet. This is a big change from just a few years ago when polls were almost exclusively conducted over the phone. One of the reasons for the shift from phone to online polling is the increasing rate of non-response on phone polls. Online polls allow pollsters to reach respondents in a more efficient way. But online polling has its own sources of error.
The biggest limitation to online polls conducted with opt-in panels concerns representation. While traditional polling methods rely on probability sampling, online panels rely on quota sampling instead. By creating bins of the important demographic groups in a population—gender, race, ethnicity, education, income—and filling those bins, researchers can arrive at an approximation of the population.
But if the 20th century has taught pollsters anything, it is that quota sampling is not a substitute for probability sampling. While phone polls also use quota sampling, it is a second-level adjustment—something to improve the outcomes of the probability sampling process. For online polls on the other hand, quota sampling is the primary way of achieving representation.
For this reason, online polls are not probability polls. As a result, the statistical calculations used to create margins of error do not apply. With many online polls, the margin of error may be even larger than for polls that use probability sampling. The statistical theory that is used to generate margins of error simply does not apply to non-probability samples.
Online polls can produce misleading results when pollsters do not check to ensure people are paying attention. While all polls can suffer from inattentiveness, the percentage of participants in online polls who provide inattentive or otherwise fraudulent answers can be very high depending on sample source. Nearly all online surveys suffer from some data quality issues to an extent, and when researchers do not screen for inattentive participants, the result can be substantial polling error.
This issue is especially important when polls examine uncommon or “low-incidence” behaviors. Because it is easy for people to click through online surveys, even a small number of bogus respondents can inflate estimates of uncommon behaviors. One of our recent studies showed the consequences of problematic respondents. In a study conducted by the CDC, extremely dangerous, COVID-19 related cleaning practices—such as gargling bleach—were portrayed as more common than they actually were. Readers should always check how a given online poll deals with data quality issues.
There are two primary issues to consider when looking at subgroups in polls. The first is oversampling of minority groups. Consider the traditional political poll that aims to recruit respondents in proportion to their share of the US population. Only about 1% of people in the 2010 US Census identified as American Indian or Alaskan Native. To represent this group in an online poll of 1,000 respondents, it would require recruiting about ten people.
To sample these 10 people or the members of other ethnic minority groups, a pollster will typically set up a quota. If the quota is not met, however, the researcher may rely on weighting to further adjust the sample. When this happens the key question becomes: how much weighting is too much?
Consider an extreme case. Say a sample needed 100 people from demographic group A, but was only able to get one. This person would have to be weighted by a factor of 100. This of course is not valid. So what weight is considered too high? While there is no standard cutoff used across polls, weights that are above a factor of 5 should be considered a red flag. A good rule of thumb is that weighting on very small samples can make the margin of error even more inaccurate.
When evaluating the outcome of a poll, it is good to ask pollsters: 1) for a histogram of the distribution of weights in their sample, and 2) if they have a process for dealing with outliers. When sample weights are very high, proceed with caution!
The second problem with weighting is that subgroups usually consist of further subgroups. For example, when a poll says it is “weighted on education,” it is likely that weights are based on whether or not the respondent has a college degree. Thus, all groups of people without a college degree get lumped into a single category and the poll misses much of the variability in different levels of educational attainment. For example, according to the US Census estimates, about 33% of the population has a college degree or higher. But looking more closely at the 67% who do not have a four-year college degree or higher, 3.5% have not gone to high school at all and 7.1% went to high school but never received a diploma, while 18% of people took some college courses but did not graduate, and 10% have a two-year college degree. This subgroup variability is concealed when everyone without a four-year degree is lumped together.
Finding participants within small subgroups is expensive and time-consuming, and most online polls simply don’t accurately represent those groups. How that affects the accuracy of a poll depends on the specific research question, and how the demographic that is being weighted correlates with the outcome of interest.
Collectively, the problems outlined above can create substantial difficulties for polls. In the last two elections, presidential horse race polls have suffered from non-response in which Republicans were less likely to participate than Democrats. This is likely what explains the inaccuracy of polls in the 2016 and 2020 elections (see here). Additionally, medical and public health polls can also misrepresent what is happening in the population. For example, a recent CDC study claimed many people were ingesting bleach to prevent COVID-19, which we showed was not accurate, as we discuss here. As polls are becoming more and more ubiquitous in our culture, keeping the five points we outlined above in mind when reading polling results can help in developing a more critical eye for interpreting polling data.