Data Scientist: Should You Trust the 2020 Election Polls? Yes, but …

Election forecasts have come a long way since the Chicago Daily Tribune’s mistaken “Dewey Defeats Truman” headline in 1948—but improved methodology doesn’t mean that unlikely events are impossible. Credit: Photo courtesy of the Harry S. Truman Presidential Library

Mansueto Institute data scientist Nicholas Marchio breaks down election forecasting.

Polling data is an essential factor in every election cycle. Pundits use it to forecast the outcome, while campaigns use it to decide where to allocate resources and shore up support. However, polls never provide a perfect crystal ball—in 2016, Donald Trump’s victory shocked almost everyone who had been closely following the race.

Nicholas Marchio knows how to analyze polling data, but he wants to “push back on the idea that data is all-powerful.” A lead data scientist at the University of Chicago’s Mansueto Institute for Urban Innovation, Marchio previously applied modeling techniques to target voters for Civis Analytics and Sen. Bernie Sanders’ presidential campaign. In the following Q&A, he pulls back the curtain on election forecasting and campaign data, explaining how good polling works and why it’s useful.

Nicholas Marchio. Credit: University of Chicago

However, Marchio also has a word of advice for obsessive poll watchers: Don’t overthink it.

Nearly all the polls were wrong about the 2016 election. Is there any reason to have higher confidence this time around?

There have been a lot of improvements in methodology. But of course, nobody really knows until the election, because only elections validate results. And polls are always going to be a snapshot of a single point in time, so the polling that’s closest to the election is the most predictive. No matter what, there’s going to be a distribution of forecasts and a margin of error in surveys, and we need to remain cognizant of that. Another way to think about it is basic probability: When you hear there’s a 25% chance that Trump will win, it might seem unlikely. But that’s the probability of flipping heads twice in a row—it happens all the time, and was about Trump’s chance of winning in 2016.

Many news outlets continuously update models that forecast election outcomes. What polling methods make for a good forecast?

It’s easy to take survey results at face value, but the methodology behind the numbers is important. That’s because there are important differences in how polls are conducted, and those details can change the poll results. Who is contacted and how responses are weighted can make a big difference. Landline telephone interviews are a classic method of polling, but more pollsters call mobile phones today. Other methods are web panel surveys—which people take online for incentives—and text-to-web surveys, in which people are texted a link to a survey. All these modes have various pros and cons.

Some of the best quality polls use voter files to improve weighting. That’s because a voter file allows the pollster to learn more about the people who are taking the survey, so they can correct for sampling errors that might make the group of survey respondents different from the electorate as a whole. For instance, voter files contain information about demographics, voting history, party registration and where people live. All that information can help determine how much “weight” to put on each survey taker’s response so that we can ensure they are representative of the people who actually vote on Election Day.

A good general rule when evaluating forecasts is to look for basic things: Do they publish a detailed methodology and have good ratings from reputable polling aggregators such as the Economist or 538? One through line that you’ll see across high-quality polls is that a lot of them weight on demographics like education, race, age, gender, party registration, and other features that are predictive of the outcome that they’re trying to forecast—in this case, support for a particular candidate.

Let’s talk about an issue that’s important this year. How will mail-in ballots change the speed with which the winner is called on election night itself?

Often in the past, races would be called before polls closed with seemingly low numbers of ballots counted—say, 30%. Usually, that’s possible because “decision desks” have access to the Associated Press data feed, a service that allows them to look at precinct-level or county-level returns as they flow in. Typically, the outlets match baseline predictions against the early results. If they observe a swing across a number of areas with similar demography, they’ll issue a projection that one candidate is performing above expectations and likely to win. Or, they might see that the early results closely match the pre-election polls and expected turnout levels, and say that an upset is unlikely and the forecasted outcome will play out. It’s a pretty well-validated approach.

But this year is different. It could take longer to count thousands of mail-in ballots, which could have a big effect in some states with razor-thin margins. So, we should be prepared to wait up to a few days for the results, especially for states that begin counting ballots on Election Day, and possibly longer if there are legal challenges. That said, election night margins will still matter. For example, if it looks like Joe Biden is in the lead early or if there is an unexpected surge in turnout or shifts in the composition of the electorate, that might actually make it easier to call the race. Given that more Democrats seem to be voting by mail this year, an early lead from ballots cast in-person would suggest that many more ballots were still to be counted, and that those ballots would be likely to favor Biden—increasing his chances of victory. But again, that’s just one potential scenario and it’s important to refrain from speculating too much.

“Nobody really knows until the election, because only elections validate results.”
— Nicholas Marchio

What is the biggest misconception that people have about campaign data use?

Data from voter files, polling, phone banking, text banking and other sources is key for campaigns to identify their supporters, donors, and volunteers. But I also want to push back on the idea that data is all-powerful: Most of what campaigns do is actually based on broad, standard demographic information and patterns—not intense psychographic models that target and mobilize voters through digital advertising at a granular level, even though it’s tempting to think that might be the case.

That said, Democrats especially have a really heterogeneous coalition that they need to hold together every cycle. That makes it harder for them to come up with a unifying message because not all messages speak to the life experience of everyone in their constituency. Because of that, microtargeting models do come into play at times, and campaigns fine-tune their messaging depending on the platform and audience. But the bottom line is that it’s important not to overthink: Ultimately, the most effective campaigns distill their messaging down to simple overarching messages that will resonate across constituencies, rather than risk alienating potential supporters.

What advice do you have for people who have been glued to forecasts this cycle?

One thing that can be tricky about making sense of the polls is that in any race, things often regress to the mean. One candidate might have a really good week, and then the advantage fades. For the next few weeks, everyone should take a deep breath, because there’s going to be more volatility in the polls right up to Election Day. Don’t let October surprises, last-minute spending sprees and timed information releases change your state of mind: Instead, channel energy toward engagement, volunteering for a campaign or participating in get out the vote efforts. There are all sorts of really great ways to get involved, and being involved is the best use of your time right now if you’re heavily invested in the race—it will certainly do more good than stressing out over polls!