+ - 0:00:00
Notes for current slide
Notes for next slide

Strength in Numbers

Political polling, democracy, and data-driven journalism

G. Elliott Morris

Mar 21, 2023 | Charlottesville, VA

1 / 70

2 / 70

"The first principle of republicanism is that the lex majoris partis is the fundamental law of every society of individuals of equal rights."

— Thomas Jefferson


"[Bear] always in mind that a nation ceases to be republican only when the will of the majority ceases to be the law."

— Thomas Jefferson


This corporeal globe, and everything upon it, belong to its present corporeal inhabitants during their generation. They alone have a right to direct what is the concern of themselves alone, and to declare the law of that direction; and this declaration can only be made by their majority. That majority, then, has a right to depute representatives to a convention, and to make the constitution what they think will be the best for themselves."

— Thomas Jefferson

3 / 70

4 / 70

5 / 70

6 / 70

7 / 70

The "soup principle"

8 / 70

The first polls

9 / 70

"Straw" polls

10 / 70

11 / 70

12 / 70

The first ("scientific") polls

- Conducted face-to-face

13 / 70

The first ("scientific") polls

- Conducted face-to-face

- Used demographic quotas for representativeness

  • Race, gender, age, geography
13 / 70

The first ("scientific") polls

- Conducted face-to-face

- Used demographic quotas for representativeness

  • Race, gender, age, geography

- Beat straw polls in accuracy (1936)

  • By shrinking bias from demographic nonresponse
13 / 70

The first ("scientific") polls

- But fell short of true survey science (1948)

14 / 70

Polls 2.0

- SSRC says: area sampling

15 / 70

Polls 2.0

- SSRC says: area sampling

15 / 70

Polls 2.0

- SSRC says: area sampling

- Gallup implements some partisan controls

  • Strata are groups of precincts by 1948 vote choice
16 / 70

Polls 2.0

- SSRC says: area sampling

- Gallup implements some partisan controls

  • Strata are groups of precincts by 1948 vote choice

- Use rough quotas within geography

16 / 70

Polls 2.0

- SSRC says: area sampling

- Gallup implements some partisan controls

  • Strata are groups of precincts by 1948 vote choice

- Use rough quotas within geography

- But, preserve interviewer bias

16 / 70

Polls 3.0

17 / 70

Polls 3.0

Technological change -> better methods

17 / 70

Polls 3.0

- 1970s: true random sampling (for people with phones)

- Response rates above 70-80%

- Rarer instances of severe nonresponse bias

- Cheaper to conduct = many news orgs poll (CBS, NYT)

18 / 70

Source: American Association of Public Opinion Research

19 / 70

The soup principle: satisfied?

Source: Pew Research Center

20 / 70

The soup principle: satisfied?

1. RDD polls are representative (at high response)

2. Availability of many different surveys allow for extra layer of aggregation to control for choices made by individual researcheers

21 / 70

= perfect polls forever,

22 / 70

= perfect polls forever,

...right?

22 / 70

Technological change -> worse methods?

Source: Pew Research Center

23 / 70

Polarized voting -> harder sampling

Source: Webster & Abramowitz 2017

24 / 70

But what if the people you sample don't represent the population?

25 / 70

But what if the people you sample don't represent the population?

- People could be very dissimilar by group, meaning small deviations in sample demographics cause big errors (sampling error)

25 / 70

But what if the people you sample don't represent the population?

- People could be very dissimilar by group, meaning small deviations in sample demographics cause big errors (sampling error)

- Or the people who respond to the poll could be systematically different from the people who don't (response error)

25 / 70

But what if the people you sample don't represent the population?

- People could be very dissimilar by group, meaning small deviations in sample demographics cause big errors (sampling error)

- Or the people who respond to the poll could be systematically different from the people who don't (response error)

- Or your list of potential respondents could be missing people (coverage error)

25 / 70

But what if the people you sample don't represent the population?

- People could be very dissimilar by group, meaning small deviations in sample demographics cause big errors (sampling error)

- Or the people who respond to the poll could be systematically different from the people who don't (response error)

- Or your list of potential respondents could be missing people (coverage error)

 

 

*Polls can also go wrong if they have bad question wording, a fourth type of survey error called "measurement error"

25 / 70

The soup principle in theory

Source: Pew Research Center

26 / 70

The soup principle in practice

27 / 70

Polls today...

 

28 / 70

Polls today...

 

- Declining response rates + Internet = innovations in polling online, but they don't use random sampling

28 / 70

Polls today...

 

- Declining response rates + Internet = innovations in polling online, but they don't use random sampling

- Traditional RDD and even RBS polls don't have a true random sample (since response rates are too low)

28 / 70

Polls today...

 

- Declining response rates + Internet = innovations in polling online, but they don't use random sampling

- Traditional RDD and even RBS polls don't have a true random sample (since response rates are too low)

- And because of nonresponse

28 / 70

So, to satisfy the soup principle...

Pollsters use statistical algorithms to ensure their samples match the population on different demographic targets

  • Race, age, gender, and region are most common

  • Can use weighting (raking) modeling (MRP), w various tradeoffs

29 / 70

These adjustments make polls pretty good!

30 / 70

But they aren't representative, per the theory of sampling

31 / 70

But they aren't representative, per the theory of sampling

...and in close races, the adjustments aren't enough:

31 / 70

Two examples:

32 / 70

2016: Education weighting

33 / 70

2020: Partisan nonresponse

34 / 70

2020: Partisan nonresponse

35 / 70

2020: Partisan nonresponse

  • Problem reaching Trump voters overall

35 / 70

2020: Partisan nonresponse

  • Problem reaching Trump voters overall

  • And within demographic groups

35 / 70

2020: Partisan nonresponse

  • Problem reaching Trump voters overall

  • And within demographic groups

  • Something you cannot fix with weighting

35 / 70

2020: Partisan nonresponse

  • Problem reaching Trump voters overall

  • And within demographic groups

  • Something you cannot fix with weighting

    • Pollsters can adjust for past vote, but the electorate changes, and certain types of voters may not respond to surveys

35 / 70

So what are we left with?

36 / 70

So what are we left with?

37 / 70

So what are we left with?

1. Traditional polls that oscillate wildly due to intensive weighting

37 / 70

So what are we left with?

1. Traditional polls that oscillate wildly due to intensive weighting

2. New "model-based" methods which trade lower variance for higher (potential) bias

37 / 70

So what are we left with?

1. Traditional polls that oscillate wildly due to intensive weighting

2. New "model-based" methods which trade lower variance for higher (potential) bias

3. Lower response rates increase chance of big misses across firms

37 / 70

38 / 70

Making polls work again

39 / 70

Making polls work again

1. More weighting variables (NYT)

39 / 70

Making polls work again

1. More weighting variables (NYT)

2. More online and off-phone data colleciton (SMS, mail)

39 / 70

Making polls work again

1. More weighting variables (NYT)

2. More online and off-phone data colleciton (SMS, mail)

3. Mixed samples (private pollsters)

39 / 70

Making polls work again

1. More weighting variables (NYT)

2. More online and off-phone data colleciton (SMS, mail)

3. Mixed samples (private pollsters)

In the pursuit of getting representative (and politically balanced) samples before and after the adjustment stage

39 / 70

In the pursuit of getting representative (and politically balanced) samples before and after the adjustment stage

40 / 70

In the pursuit of getting representative (and politically balanced) samples before and after the adjustment stage

To satisfy the soup principle

40 / 70

What about aggregation?

Forecasters have a few tricks up our sleeves:

41 / 70

What goes into the model?

1. National economic + political fundamentals

2. Decompose into state-level priors

3. Add the (average of) polls

42 / 70

2. National fundamentals?

i) Index of economic growth (1940 - 2016)

  • eight different variables, scaled to measure the standard-deviation from average annual growth

ii) Presidential approval (1948 - 2016)

iii) Polarization (1948 - 2016)

  • measured as the share of swing voters in the electorate, per the ANES --- and interacted with economic growth

iv) Whether an incumbent is on the ballot

43 / 70

44 / 70

45 / 70

46 / 70

2. The model is a federalist

i) Train a model to predict the Democratic share of the vote in a state relative to the national vote, 1948-2016

  • Variables are: lean in the last election, lean two elections ago, home state effects * state size, conditional on the national vote in the state

ii) Use the covariates to make predictions for 2020, conditional on the national fundamentals prediction for every day

ii) Simulate state-level outcomes to extract a mean and standard deviation

  • Propogates uncertainty both from the LOOCV RMSE of the national model and the state-level model
47 / 70

That's the baseline

48 / 70

That's the baseline

Now, we add the polls

48 / 70

3. Add the (average of) polls

  • Just a trend through points...
  • Can do with any series of packages for R, other statistical languages
49 / 70

3. Add the (average of) polls

  • Just a trend through points...
  • Can do with any series of packages for R, other statistical languages

49 / 70

3. Add the (average of) polls

(...but with some fancy extra stuff)

mu_b[:,T] = cholesky_ss_cov_mu_b_T * raw_mu_b_T + mu_b_prior;
for (i in 1:(T-1)) mu_b[:, T - i] = cholesky_ss_cov_mu_b_walk * raw_mu_b[:, T - i] + mu_b[:, T + 1 - i];
national_mu_b_average = transpose(mu_b) * state_weights;
mu_c = raw_mu_c * sigma_c;
mu_m = raw_mu_m * sigma_m;
mu_pop = raw_mu_pop * sigma_pop;
e_bias[1] = raw_e_bias[1] * sigma_e_bias;
sigma_rho = sqrt(1-square(rho_e_bias)) * sigma_e_bias;
for (t in 2:T) e_bias[t] = mu_e_bias + rho_e_bias * (e_bias[t - 1] - mu_e_bias) + raw_e_bias[t] * sigma_rho;
//*** fill pi_democrat
for (i in 1:N_state_polls){
logit_pi_democrat_state[i] =
mu_b[state[i], day_state[i]] +
mu_c[poll_state[i]] +
mu_m[poll_mode_state[i]] +
mu_pop[poll_pop_state[i]] +
unadjusted_state[i] * e_bias[day_state[i]] +
raw_measure_noise_state[i] * sigma_measure_noise_state +
polling_bias[state[i]];
}
50 / 70

3. Add the (average of) polls

51 / 70

3. Add the (average of) polls

i. Latent state-level vote shares evolve as a random walk over time

  • "Walks" toward the state-level fundamentals more as we are further out from election day
51 / 70

3. Add the (average of) polls

i. Latent state-level vote shares evolve as a random walk over time

  • "Walks" toward the state-level fundamentals more as we are further out from election day

ii. Polls are observations with measurement error that are debiased on the basis of:

  • Pollster firm (so-called "house effects")
  • Poll mode
  • Poll population
  • Bias in previous elections
51 / 70

3. Add the (average of) polls

i. Latent state-level vote shares evolve as a random walk over time

  • "Walks" toward the state-level fundamentals more as we are further out from election day

ii. Polls are observations with measurement error that are debiased on the basis of:

  • Pollster firm (so-called "house effects")
  • Poll mode
  • Poll population
  • Bias in previous elections

iii. Correcting for partisan non-response

  • Whether a pollster weights by party registration or past vote
  • Adjusts for biases that remain AFTER removing the other biases
51 / 70

3. Add the (average of) polls

Notable improvements from partisan non-response (and other?) issues

52 / 70

In 2016...

53 / 70

In 2016...

... But not 2020

53 / 70

In 2016...

... But not 2020

53 / 70

One more lesson:

1. Traditional polls that oscillate wildly due to intensive weighting

2. New "model-based" methods which trade lower variance for higher (potential) bias

3. Lower response rates increase chance of big misses across firms

54 / 70

One more lesson:

1. Traditional polls that oscillate wildly due to intensive weighting

2. New "model-based" methods which trade lower variance for higher (potential) bias

3. Lower response rates increase chance of big misses across firms

4. Aggregation is not a magic bullet

54 / 70

4. Aggregation is not a magic bullet

55 / 70

4. Aggregation is not a magic bullet

.....But luck is

55 / 70

The polls in 2022

56 / 70

The polls in 2022

57 / 70

58 / 70

Polling: fixed?

What does all of this mean for the pollsters?

59 / 70

Polling: fixed?

60 / 70

Polling: fixed?

1. Death of polling is greatly exaggerated

60 / 70

Polling: fixed?

1. Death of polling is greatly exaggerated

2. Luck plays a big role in getting elections "right"

60 / 70

Polling: fixed?

1. Death of polling is greatly exaggerated

2. Luck plays a big role in getting elections "right"

3. High nonresponse is not necessarily directional

60 / 70

And what does it mean for democracy?

The truth is that polls are good!

61 / 70

Polls -> better representation

62 / 70

Polls -> more focused campaigns

63 / 70

Polls = important, even when done improperly

64 / 70

Political polls and the general will

65 / 70

Political polls and the general will

66 / 70

67 / 70

We don't know what the people really want...

68 / 70

We don't know what the people really want...

...until we take a poll

68 / 70

There is Strength in Numbers

69 / 70

Thank you!

Website: gelliottmorris.com

Twitter: @gelliottmorris

Questions?


These slides were made using the xaringan package for R. They are available online at https://www.gelliottmorris.com/slides/

70 / 70

2 / 70
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow