0 likes307 views

Why polls with more respondents from one party aren’t necessarily wrong

September 9, 2023

When CNN released a poll this week considering views of the 2024 presidential election, it triggered a lot of annoyance from supporters of President Biden. It wasn’t just that the race was close, something that shouldn’t come as a surprise, given the state of American politics. Nor did the consternation concern the utility of polling general election support 14 months early, something that offers fewer insights about projected winners than it does the strength of candidates at the moment of polling.

Instead, what annoyed many on the left was the pool of people who responded to the poll, a group that they said was far too heavily Republican and therefore not an accurate representation of the electorate.

But this criticism is wrong.

If you’ve read this far, I’ll trust that you have at least some confidence in public opinion polling, which you certainly should. Regardless, we should dispel some lingering myths just to get them out of the way: It is not the case that pollsters these days only call landlines (a common misunderstanding) or that the poll results in recent years have been consistently wrong.

Today, though, we’re going to talk about something else: weighting.

At the top of CNN’s poll (conducted by SSRS) is an explanation of the composition of the respondent pool.

“Surveys were obtained Aug. 25-31, 2023 with a representative sample of n=1,503 respondents, including an oversample of Republicans and Republican-leaning independents to reach a total of 898 Republicans and Republican-leaning independents.”

In other words, 898 of the 1503 respondents were Republicans, about 60 percent of them. Wouldn’t that skew the results against Democrats?

Not necessarily — and certainly not in this poll. Even before we get to the math, it’s useful to break down the allegation here. The argument, it seems, is that CNN is intentionally misrepresenting the voter pool for some reason. But why? Why toss credibility in the dumpster? To write a story about how Biden is doing worse than expected? You can write a story like that without spending the money to contact registered voters and ask them questions.

So let’s do an experiment to see how this works.

Let’s stipulate that there are objective numbers capturing support in both the general election and — very important here! — the Republican primary. It doesn’t really matter what they are, so let’s use CNN’s results. In the general election polling, Biden trails Donald Trump 46 percent to 47 percent. (“Trails” here implies there’s significance to that 1-point difference, which there isn’t, but that’s an argument for another day.) In the Republican primary, Trump leads Florida Gov. Ron DeSantis, 52 percent to 18 percent, with former ambassador Nikki Haley and former vice president Mike Pence both at 7 percent.

A key part of conducting a poll is figuring out whose opinion you’re trying to measure. Let’s say it’s the voter pool in November 2024. You need to come up with an estimate of what that pool of voters will look like. When polls end up being off the mark, often the problem was with the expected voter pool rather than the results. (The New York Times did an interesting experiment in 2016 showing how different assumptions about the electorate can change poll results.) I used CNN’s results to reverse-engineer the pool the surveyors expected to vote next year.

In this exercise, you will play the role of the pollster. Start by clicking the buttons below, adding a random number of Democrats, Republicans and independents into your voter pool. Each will support Biden or Trump depending on the levels of support indicated in CNN’s poll.

Hit either of the buttons a few times but don’t go much higher than 50 total respondents yet.

Let’s evaluate the general election.

The result is an unweighted assessment of support in the general election. That is, we have a pool of respondents that doesn’t necessarily look like the expected electorate and hasn’t been corrected so that it does. But there’s a catch: When we are randomly adding people to the respondent pool in this experiment, that randomness is informed by the pool we want. So, over time, it should trend toward the actual levels of support seen in CNN’s poll.

This is how polls work! Add more people, get more-accurate results. There are diminishing returns after a while; there’s a reason many national polls include samples of about 600 people.

But while we might have a nice chunk of respondents overall, only some of them are Republicans. Here are the current unweighted results in the primary. Maybe they look a lot like the final result. But they probably don’t.

In the current respondent pool:

No data yet.

So let’s start throwing more Republicans — and only Republicans — into the mix. We start getting better primary results, but the unweighted general election results start shifting unfairly toward Trump.

Let’s evaluate the primary election.

This is what those critics of CNN’s poll think is happening: All those Republicans in the mix are tainting the general election results. But you see why it’s useful to have more Republicans, given that it provides more accurate results in the Republican primaries.

So how do you both poll more Republicans (to improve your primary accuracy) without ruining your overall results? You give each Republican less weight in the overall numbers. In other words, if you have twice as many Republicans in the overall pool as you wanted based on your targeted electorate, you simply make each Republican count for half as much.

This is not a complicated bit of math. You can see below how many respondents from each partisan group we have in our pool, how we would weight them to match the target and then how the general election results change once the weighting has been applied.

In the current respondent pool:

No data yet.

the whole experiment.

It should be much closer to the Biden 46, Trump 47 numbers that we’re assuming are accurate. Perhaps it isn’t, because you have too few Democrats or independents. If so, go hit the “ADD 5 RESPONDENTS” button a few more times. More respondents, more accuracy.

This is how it works. And that’s what CNN explained in its text at the top of the poll: They’re oversampling Republicans, adding more Republicans to the mix, to get better primary results. But the CNN pollsters know how to correct for that oversample when they’re looking at the electorate overall.

All of this can be put more simply: It is not CNN’s fault that Biden is essentially tied with Trump at this point in the cycle.

This post appeared first on The Washington Post