But before that, let's talk about our goals. There are three:
Goals of data journalism are not so different from the goals of social science
Adapted here from the book Data Analysis for Social Science by Elena Llaudet and Kosuke Imai
So I hope the presentation can be helpful for students as they decide if they want to go into journalism, which has the distinguished characteristic of being one of the industries that probably has only marginal returns over becoming an academic
Story: Outlier poll getting a lot of attention. Fishy results, shady firm.
Novelty: Asked pollster for their data, re-weighted it to generate results.
Explanation: Pollsters repeating mistakes of 2016, not weighting data by education
Explanation is visually simple
Explain process for story
While not a story where we needed Bayesian analysis, it could have taught us more:
Answer questions like:
Given data constraints, we're really asking: How many Clinton and Trump voters are there?
Given data constraints, we're really asking: How many Clinton and Trump voters are there?
The answer lets us assign Electoral College votes.
Includes demographic data and 2016 vote choice for 40,000+ validated voters
Includes demographic data and 2016 vote choice for 40,000+ validated voters
Includes the same demographic data as the CCES 380,000 “cells”
Via “post-stratification” on the ACS
Souce: Groves et al., 2009
brms
syntax:Sampling, via the Bayesian logit updater (and survey weights)
Non-response, via adjustment back to survey frame
And adjustment error, via varying parameter estimates and partial pooling
Uncertainty is propagated throughout the models, incorporated via MCMC sampling in step 3.
mu_b[:,T] = cholesky_ss_cov_mu_b_T * raw_mu_b_T + mu_b_prior; for (i in 1:(T-1)) mu_b[:, T - i] = cholesky_ss_cov_mu_b_walk * raw_mu_b[:, T - i] + mu_b[:, T + 1 - i];national_mu_b_average = transpose(mu_b) * state_weights;mu_c = raw_mu_c * sigma_c;mu_m = raw_mu_m * sigma_m;mu_pop = raw_mu_pop * sigma_pop;e_bias[1] = raw_e_bias[1] * sigma_e_bias;sigma_rho = sqrt(1-square(rho_e_bias)) * sigma_e_bias;for (t in 2:T) e_bias[t] = mu_e_bias + rho_e_bias * (e_bias[t - 1] - mu_e_bias) + raw_e_bias[t] * sigma_rho;//*** fill pi_democratfor (i in 1:N_state_polls){ logit_pi_democrat_state[i] = mu_b[state[i], day_state[i]] + mu_c[poll_state[i]] + mu_m[poll_mode_state[i]] + mu_pop[poll_pop_state[i]] + unadjusted_state[i] * e_bias[day_state[i]] + raw_measure_noise_state[i] * sigma_measure_noise_state + polling_bias[state[i]];}
Website: gelliottmorris.com
Twitter: @gelliottmorris
These slides were made using the xaringan
package for R. They are available online at https://www.gelliottmorris.com/slides/
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |