The use and abuse of polls in US elections
Among the single most frequent questions I am asked every election cycle, but especially this one, is “Are the polls accurate?” The answer is: they can be, but not for what most people think.
This question is generally preceded with the statement “The polls were entirely wrong in 2016, they said Clinton would win and she did not.” Are the polls accurate, is there a problem with them? Are the polls in 2020, as in 2016, missing hidden Trump voters? To answer this question one needs to understand some basic points about polling.
Good polls are sort of accurate: But know what you are surveying
First, when it comes to 2016, the good national polls were entirely accurate. They said that Hillary Clinton would win the national popular vote by about two-three percentage points, with a margin of error or about three points. These polls were dead on score. The problem was not the polling but their relevance.
We do not elect the president by the national popular vote and instead it is the electoral college which is essentially 50 separate state elections (plus the District of Columbia). National polls for the purposes of predicting presidential winners are entirely irrelevant. Ignore them all because they are looking at the wrong unit of analysis.
Second, the state polls were largely accurate too. If one tracked what was happening in states such as Pennsylvania days before the election one could see the polls tightening and Trump narrowing the lead on Clinton as undecided voters made up their mind. On the Monday before the election in 2016 in states such as Pennsylvania the polls had the race dead even.
Third, there were no hidden Trump voters. Nationally and in the critical states Trump did not receive many more votes than Romney did in 2012. The issue was not a Republican voter surge for Trump but Democrats staying home and not voting for Clinton.
Polls are not predictors but statistical snapshots in time
But additionally, remember first that polls are supposed to be statistical profiles of a population. This means that a good poll is a small sample of a larger population that resembles the latter in all relevant characteristics. Polls are only as good as the assumptions that go into them. Good pollsters accurately reflect who is likely to vote, the partisan, geographic, or other make up of the electorate. If you make bad assumptions, you get bad results. This is the old “garbage in, garbage out” theory.
Polls also are not predictors – they are snapshots in time. Lots of things can happen between the time a poll is done and an election occurs. Candidate strategies matter, as do messaging, and other intervening variables. Thinking that polls are predictors is the root of many problems.
The flaws in FiveThirtyEight
Consider Nate Silver and FiveThirtyEight. Four years ago they predicted an 80%+ chance Clinton would win. As of October 26, 2020 the prediction is an 88% chance of a Biden victory. The model used here is based on polls – using them as predictors of what will happen on election day.
If the polls on which they are based are wrong, the predictions will be wrong, even if we still concede that polls are not predictors. FiveThirtyEight’s predictive model is premised on a way of thinking about polls that is simply wrong.
An example of bad polling: Minnesota US Senate race
It is possible that Biden will win, but the polls are very close in the critical swing states such as Pennsylvania, Michigan, and Wisconsin. But accepting everything I said in this essay, there is also a difference between good and bad polls.
Let me use as an example a recent poll conducted in Minnesota and released last week declaring the US Senate race between Tina Smith and Jason Lewis to be a dead heat. It reported Smith with 43%, Lewis 42%, down from an 11-point lead just a few weeks ago. Is it possible that the race has tightened, but more is going on here to question the validity of the poll.
First, the poll had a margin of error of +/- five points. Smith could actually be at 48% and Lewis at 37%, an 11-point difference. This margin of error was driven by the fact that there were only 625 voters registered in Minnesota. This is a pitifully small sample.
Second, it identified likely voters as a registered voter and traditionally 10-15% of voters in Minnesota register on election day.
Third, the sample contained 38% Republicans and 35% Democrats. Unless there has been a major shift in partisan alignment in Minnesota, no credible survey lists there being more people who identify as Republican than Democrat. If anything, one can make the argument that a good sample should be 38% Democrat and 35% Republican, especially keeping in mind that those who do register on election day tend to be younger voters who tend to vote for Democrats. Effectively, this survey may be skewed six or more points in favor of a Republican.
Fourth, the survey was done online. Not all surveys done online are bad, but there is a significant digital bias of self-selection in such surveys that warrant correction. There is no indication this survey did that.
Nerd warning: Confidence levels versus credibility intervals
Finally, there is one last problem that only nerds like me can appreciate. The survey did not employ confidence levels but instead a credibility interval to determine the accuracy of the poll. Why is this important?
When polls are done the question to be asked is what is the probability that the sample is a good representation of the entire relevant population. The smaller the confidence level, statistically the better the chances it is a good survey. The gold standard for survey research is a confidence level of .05. This means there is a 95% chance that the sample is an accurate representation of the entire population. This .05 also means there is still a 5% chance the sample is skewed and therefore the poll is bad.
A credibility interval is something different. It is based on Bayesian statistics and it asks what are the chances that a given sample is an accurate representation of a prediction that you have made.
A confidence level does predict what the sample should look like but instead asks whether the sample is probably a good mini-version of the entire population, whatever its relevant characteristics are. A credibility interval asks what are the chances a sample mirrors the pre-existing assumptions one has made about the entire population.
A credibility interval, in my opinion, is the wrong way to do a survey. Effectively you make your assumptions about the composition of the electorate and test to see if you have a sample that mirrors it. Your initial assumptions are held constant and tested. With a confidence level, you are not holding constant your initial electorate assumptions and instead are asking if the results you get are probabilistically correct. In effect, credibility intervals test garbage in, confidence levels test garbage out. Many do not see a difference in these two statistical methods but they can yield differences in results and potentially skew results.
This poll was a bad one. It made a lot of mistakes. The only benefit to it is for Smith and Lewis who can both now say the race is very tight and therefore send money and votes. Beyond that, it is an example of a bad poll, the kind that can also skew presidential polls which in turn can skew predictive models such as FiveThirtyEight.
Conclusion: Polls can be useful, but analysis based on them often isn’t
The morale of the story is that polls done well can be good and accurate and accurate snapshots in time. But there is a lot of bad polling. Even worse, there is a lot of bad analysis based on polling. Four years ago analysts got it wrong when they let the disbelief of a Trump victory cloud their thinking. They also failed to understand the proper level of analysis to do presidential polling and how to understand whether a poll is valid or reliable.
This article was first published on Prof. Schultz’s blog.