class: left, top, title-slide # Strength in Numbers ## Political polling, democracy, and data-driven journalism ###
G. Elliott Morris
### Mar 21, 2023 | Charlottesville, VA --- <img src="figures/cover.jpg" width="50%" /> --- > "The first principle of republicanism is that the lex majoris partis is the fundamental law of every society of individuals of equal rights." > > — Thomas Jefferson <br /> > "[Bear] always in mind that a nation ceases to be republican only when the will of the majority ceases to be the law." > > — Thomas Jefferson <br / > > This corporeal globe, and everything upon it, belong to its present corporeal inhabitants during their generation. They alone have a right to direct what is the concern of themselves alone, and to declare the law of that direction; and this declaration can only be made by their majority. That majority, then, has a right to depute representatives to a convention, and to make the constitution what they think will be the best for themselves." > > — Thomas Jefferson --- <img src="figures/couzin_2011.png" width="90%" /> --- <img src="figures/couzin_2011_f1.png" width="100%" /> --- <img src="figures/couzin_2011_f2.png" width="50%" /> <img src="figures/cover.jpg" width="50%" /> --- <img src="figures/tomato.jpeg" width="80%" /> --- # The "soup principle" <img src="figures/tomato.jpeg" width="60%" /> --- class: center, inverse, middle # The first polls --- # "Straw" polls <img src="figures/polls_street.jpg" width="80%" /> --- <img src="figures/digest_poll.jpeg" width="70%" /> --- <img src="figures/digest_1936.jpg" width="60%" /> --- # The first ("scientific") polls ### - Conducted face-to-face -- ### - Used demographic quotas for representativeness - Race, gender, age, geography -- ### - Beat straw polls in accuracy (1936) - By shrinking bias from demographic nonresponse --- # The first ("scientific") polls ### - But fell short of true survey science (1948) <img src="figures/dewey_truman.jpeg" width="60%" /> --- # Polls 2.0 ### - SSRC says: area sampling -- <img src="figures/houston.png" width="60%" /> --- # Polls 2.0 ### - SSRC says: area sampling ### - Gallup implements some partisan controls - Strata are groups of precincts by 1948 vote choice -- ### - Use rough quotas within geography -- ### - But, preserve interviewer bias --- # Polls 3.0 <img src="figures/phone.jpeg" width="70%" /> -- ### Technological change -> better methods --- # Polls 3.0 ### - 1970s: true random sampling (for people with phones) ### - Response rates above 70-80% ### - Rarer instances of severe nonresponse bias ### - Cheaper to conduct = many news orgs poll (CBS, NYT) --- <img src="figures/aapor.png" width="90%" /> _Source: American Association of Public Opinion Research_ --- # The soup principle: satisfied? <img src="figures/pew_soup.png" width="80%" /> _Source: Pew Research Center_ --- # The soup principle: satisfied? ### 1. RDD polls are representative (at high response) ### 2. Availability of many different surveys allow for extra layer of aggregation to control for choices made by individual researcheers --- class: center, inverse, middle # = perfect polls forever, <br><br> -- # ...right? --- ### Technological change -> worse methods? <img src="figures/pew_response_rate.jpg" width="60%" /> _Source: Pew Research Center_ --- ### Polarized voting -> harder sampling <img src="figures/affpol.png" width="75%" /> _Source: Webster & Abramowitz 2017_ --- .center[ ## But what if the people you sample don't represent the population? ] -- #### - People could be very dissimilar by group, meaning small deviations in sample demographics cause big errors (sampling error) -- #### - Or the people who respond to the poll could be systematically different from the people who don't (response error) -- #### - Or your list of potential respondents could be missing people (coverage error) -- *Polls can also go wrong if they have bad question wording, a fourth type of survey error called "measurement error" --- ## The soup principle in theory <img src="figures/pew_soup.png" width="90%" /> _Source: Pew Research Center_ --- ## The soup principle in practice <img src="figures/minestrone.jpg" width="60%" /> --- class: center, middle # Polls today... -- #### - Declining response rates + Internet = innovations in polling online, but they don't use random sampling -- #### - Traditional RDD and even RBS polls don't have a true random sample (since response rates are too low) -- #### - And because of nonresponse --- ## So, to satisfy the soup principle... ### Pollsters use statistical algorithms to ensure their samples match the population on different demographic targets - Race, age, gender, and region are most common - Can use weighting (raking) modeling (MRP), w various tradeoffs .pull-left[ <img src="figures/raking.jpg" width="100%" /> ] .pull-right[ <img src="figures/mrp.jpg" width="100%" /> ] --- # These adjustments make polls pretty good! <img src="figures/aapor.png" width="75%" /> --- class: center, middle, # But they aren't _representative_, per the theory of sampling -- # ...and in close races, the adjustments aren't enough: --- class: inverse, center, middle # Two examples: --- # 2016: Education weighting <img src="figures/weighting_education.jpg" width="100%" /> --- # 2020: Partisan nonresponse <img src="figures/gq_rs_polls.jpg" width="90%" /> --- # 2020: Partisan nonresponse <img src="figures/gq_rs_polls.jpg" width="40%" /> -- - ### Problem reaching Trump voters overall -- - ### And _within_ demographic groups -- - ### Something you cannot fix with weighting -- - #### Pollsters can adjust for past vote, but the electorate changes, and certain _types_ of voters may not respond to surveys --- class: center middle # So what are we left with? --- # So what are we left with? -- ### 1. Traditional polls that oscillate wildly due to intensive weighting -- ### 2. New "model-based" methods which trade lower variance for higher (potential) bias -- ### 3. Lower response rates increase chance of big misses across firms --- <img src="figures/polls_2020_538.png" width="80%" /> --- # Making polls work again -- ### 1. More weighting variables (NYT) -- ### 2. More online and off-phone data colleciton (SMS, mail) -- ### 3. Mixed samples (private pollsters) -- ### In the pursuit of getting representative (and politically balanced) samples _before and after_ the adjustment stage --- class: center, middle ### In the pursuit of getting representative (and politically balanced) samples _before and after_ the adjustment stage -- ### To satisfy the soup principle --- class: center, middle, inverse # What about aggregation? ### Forecasters have a few tricks up our sleeves: --- # What goes into the model? ### 1. National economic + political fundamentals ### 2. Decompose into state-level priors ### 3. Add the (average of) polls --- # 2. National fundamentals? ### i) Index of economic growth (1940 - 2016) - eight different variables, scaled to measure the standard-deviation from average annual growth ### ii) Presidential approval (1948 - 2016) ### iii) Polarization (1948 - 2016) - measured as the share of swing voters in the electorate, per the ANES --- and interacted with economic growth ### iv) Whether an incumbent is on the ballot --- <img src="figures/fundamentals_economy.png" width="80%" /> --- <img src="figures/fundamental_approval.png" width="80%" /> --- <img src="figures/fundamentals_with_incumbency.png" width="100%" /> --- # 2. The model is a federalist #### i) Train a model to predict the Democratic share of the vote in a state relative to the national vote, 1948-2016 * Variables are: lean in the last election, lean two elections ago, home state effects * state size, conditional on the national vote in the state #### ii) Use the covariates to make predictions for 2020, _conditional on the national fundamentals prediction for every day_ #### ii) Simulate state-level outcomes to extract a mean and standard deviation * Propogates uncertainty both from the LOOCV RMSE of the national model and the state-level model --- class: center, inverse, middle # That's the baseline -- # Now, we add the polls --- # 3. Add the (average of) polls - Just a trend through points... - Can do with any series of packages for R, other statistical languages -- <img src="figures/loess.png" width="50%" /> --- # 3. Add the (average of) polls ### (...but with some fancy extra stuff) ```{Stan mu_b[:,T] = cholesky_ss_cov_mu_b_T * raw_mu_b_T + mu_b_prior; for (i in 1:(T-1)) mu_b[:, T - i] = cholesky_ss_cov_mu_b_walk * raw_mu_b[:, T - i] + mu_b[:, T + 1 - i]; national_mu_b_average = transpose(mu_b) * state_weights; mu_c = raw_mu_c * sigma_c; mu_m = raw_mu_m * sigma_m; mu_pop = raw_mu_pop * sigma_pop; e_bias[1] = raw_e_bias[1] * sigma_e_bias; sigma_rho = sqrt(1-square(rho_e_bias)) * sigma_e_bias; for (t in 2:T) e_bias[t] = mu_e_bias + rho_e_bias * (e_bias[t - 1] - mu_e_bias) + raw_e_bias[t] * sigma_rho; //*** fill pi_democrat for (i in 1:N_state_polls){ logit_pi_democrat_state[i] = mu_b[state[i], day_state[i]] + mu_c[poll_state[i]] + mu_m[poll_mode_state[i]] + mu_pop[poll_pop_state[i]] + unadjusted_state[i] * e_bias[day_state[i]] + raw_measure_noise_state[i] * sigma_measure_noise_state + polling_bias[state[i]]; } ``` --- # 3. Add the (average of) polls -- #### i. Latent state-level vote shares evolve as a random walk over time * "Walks" toward the state-level fundamentals more as we are further out from election day -- #### ii. Polls are observations with measurement error that are debiased on the basis of: * Pollster firm (so-called "house effects") * Poll mode * Poll population * Bias in previous elections -- #### iii. Correcting for partisan non-response * Whether a pollster weights by party registration or past vote * Adjusts for biases that remain AFTER removing the other biases --- # 3. Add the (average of) polls #### Notable improvements from partisan non-response (and other?) issues <img src="figures/states-vs-results.png" width="80%" /> --- class: center, middle # In 2016... -- ## ... But not 2020 -- <img src="figures/2020-economist-histogram.png" width="80%" /> --- # One more lesson: ### 1. Traditional polls that oscillate wildly due to intensive weighting ### 2. New "model-based" methods which trade lower variance for higher (potential) bias ### 3. Lower response rates increase chance of big misses across firms -- ### 4. Aggregation is not a magic bullet --- # 4. Aggregation is not a magic bullet -- ### .....But luck is --- ## The polls in 2022 <img src="figures/cohn_2022.png" width="95%" /> --- class: center, middle ## The polls in 2022 <img src="figures/economist_2022.jpeg" width="95%" /> --- class: center, middle <img src="figures/ekins_2022.jpeg" width="95%" /> --- class: center, middle, inverse # Polling: fixed? ### What does all of this mean for the pollsters? --- ## Polling: fixed? -- ### 1. Death of polling is greatly exaggerated -- ### 2. Luck plays a big role in getting elections "right" -- ### 3. High nonresponse is not necessarily directional --- class: center, middle, inverse ## And what does it mean for democracy? ### The truth is that polls are good! --- class: center, middle ## Polls -> better representation <img src="figures/butler_nickerson.png" width="95%" /> --- class: center, middle ## Polls -> more focused campaigns <img src="figures/old_comp.jpg" width="95%" /> --- class: center, middle ## Polls = important, even when done improperly <img src="figures/iraq_polls.png" width="95%" /> --- class: center, middle ## Political polls and the general will <img src="figures/democracy.jpg" width="95%" /> --- class: center, middle ## Political polls and the general will <img src="figures/elections.jpg" width="85%" /> --- class: center, middle ## Elections do not always translate to *popular* sovereignty <img src="figures/electoral_college.png" width="40%" /> --- class: center, middle ## We don't know what the people _really_ want... -- ## ...until we take a poll --- class: center, middle ## There is Strength in Numbers <img src="figures/medicare.jpeg" width="100%" /> --- # Thank you! **Website: [gelliottmorris.com](https://www.gelliottmorris.com)** **Twitter: [@gelliottmorris](http://www.twitter.com/gelliottmorris)** ### Questions? --- .bottom[ _These slides were made using the `xaringan` package for R. They are available online at https://www.gelliottmorris.com/slides/_ ]