class: left, top, title-slide # How journalists tell stories with data ##
###
G. Elliott Morris
, Data journalist and US correspondent,
The Economist
### <class=‘date’>March 1, 2022
Prepared for a talk to the Data Analytics Symposium at Queens University, Charlotte, North Carolina --- <img src="figures/cover.jpg" width="50%" /> --- ## Background... - From a small town on a barrier island in south Texas - Population 4,123 (2020) - 86% white, 7% Hispanic, 4% "two or more" - Median household income of $57,910 - 11% poverty rate - I went to college at the University of Texas at Austin - Studied government and history - With some statistics and computer science - Graduated in 2018, hired by _The Economist_ in the fall of my senior year - I've always been interested in politics and policy, only really became fascinated with data in the second semester of college - Read a book called _Big Data_ by Kenn Cukier andViktor Mayer-Schönberger - In 2016, built a forecasting model for the US presidential primaries and general election - The rest, as they say, is history --- class: middle, center, inverse # [ Data ] journalism --- class: middle, center # BIG DATA .footnote[ <sup>_small data_ </sup> ] --- class: middle, center <img src="figures/faang.png" width="100%" /> --- class: center <img src="figures/faang.png" width="100%" /> #### AT&T, Verizon, Comcast #### FitBit, iRobot, Phillips, Sony #### Tesla, Kia #### Indeed --- class: middle, center <img src="figures/cloud.jpg" width="100%" /> --- class: middle, center # 90% ### Of all data worldwide was created in the last 2 years Google estimates there will be roughly 175 ZB of data 2025. That's about 175 billion times the size of the storage on your laptop. --- class: center <img src="figures/machine_learning.jpg" width="90%" /> -- ### Use #1. Actionable business insights -- ### Use #2. Telling stories? --- class: center, middle, inverse # [ Data journalism ] --- class: center # Data journalism today (perceptions): -- ## 1. Complex modeling -- ## 2. Large datasets "big(ger) data" -- ## 3. Resource-intensive learners and pipelines --- <img src="figures/economist_excess_deaths.png" width="80%" /> .footnote[ _https://www.economist.com/graphic-detail/coronavirus-excess-deaths-estimates_ ] --- <img src="figures/economist_house_prices.png" width="100%" /> .footnote[ _https://www.economist.com/graphic-detail/2021/04/10/our-house-price-forecast-expects-the-global-rally-to-lose-steam_ ] --- <img src="figures/economist_2020_potus.png" width="100%" /> .footnote[ _https://projects.economist.com/us-2020-forecast/president_ ] --- class: center, middle # Data journalism today (for real): -- ## Most stories use small data -- Datasets from academic studies Single time-series from other companies or websites Polls or social surveys (Extracts of) Census data --- ## Data journalism today (for real): ### The bread and butter of our daily work is improving stories with original data analysis and visualization. --- <img src="figures/gd_homepage.png" width="100%" /> --- <img src="figures/ukraine_jets.png" width="100%" /> --- <img src="figures/ukraine_internet.png" width="100%" /> --- <img src="figures/ukraine_russia_banks.png" width="100%" /> --- <img src="figures/vaccine_mandates.png" width="100%" /> --- <img src="figures/inflation_polls.png" width="100%" /> --- class: center ## What do all of these examples have in common? -- ### 1. Examples of empirical journalism: or, "data in service of stories"<sup>1</sup> -- ### 2. Not "data science" or "computational journalism" -- #### Lesson: Data journalism does not need to be complex to tell a story. In fact, most of the time, simpler is better. .footnote[ [1] _See: Sarah Cohen, "Numbers in the Newsroom"_ ] --- class: middle, center # " In fact, most of the time, simpler is better. " -- ### Lower signal-to-noise ratio ### The "curse of dimensionality" --- class: middle, center <img src="figures/signal-noise-scatter.png" width="100%" /> --- class: middle, center <img src="figures/dimensionality_accuracy.png" width="100%" /> --- class: middle, center # Data *analytics* is not data *science* ## Neither is data *journalism* --- ## Data *analytics* is not data *science* **Data analysts** examine large data sets to identify trends, develop charts, and create visual presentations to help businesses make more strategic decisions. **Data scientists** on the other hand, design and construct new processes for data modeling and production using prototypes, algorithms, predictive models, and custom analysis. Data (or "empirical") journalists do both of these things, but usually proceed from the former to the latter *only when necessary* - and only when the data and modeling processes can supply enough precision. .footnote[ _Source: Northwestern_ ] --- ## Data journalism in a nutshell Most stories fall into one of 3 categories: ### 1. Small data that add context to a story ### 2. Visualizations explaining findings, meeting readers where they are ### 3. Complex statistical models or big(-ish) data that help us decipher relationships between variables and quantify uncertainty --- class: middle, center ### 80% of stories fall in bucket 1-2 --- class: inverse, middle, center # Case studies --- ## Example 1: House redistricting ### Narrative: <img src="figures/wasserman_redistricting.png" width="50%" /> --- ## Example 1: House redistricting ### Data: <img src="figures/538_redistricting.png" width="100%" /> .footnote[ _https://projects.fivethirtyeight.com/redistricting-2022-maps/_ ] --- ## Example 1: House redistricting ### Original analysis: <img src="figures/economist_redistricting_a.png" width="40%" /> .footnote[ _https://www.economist.com/united-states/democrats-have-fared-surprisingly-well-in-congress-new-maps/21807593_ ] --- ## Example 1: House redistricting ### Original analysis .pull-left[ <img src="figures/economist_redistricting_b.png" width="100%" /> ] .pull-right[ <img src="figures/economist_redistricting_c.png" width="100%" /> ] .footnote[ _https://www.economist.com/united-states/democrats-have-fared-surprisingly-well-in-congress-new-maps/21807593_ ] --- ## Example 2: News in 2021 <img src="figures/economist_2021_news.png" width="70%" /> .footnote[ _https://www.economist.com/graphic-detail/2021/12/18/2021s-biggest-stories-were-covid-19-and-americas-presidential-transition_ ] --- ## Example 3: Covid-19 forecasts .pull-left[ Predictions for December 2021 <img src="figures/cdc_covid_nov21.png" width="100%" /> ] .pull-right[ Actual values for Dec 2021 <img src="figures/cdc_covid_dec21.png" width="100%" /> ] .footnote[ _https://covid19forecasthub.org/reports/single_page.html?state=US&week=2022-01-04_ ] --- ## Example 3: Covid-19 forecasts .pull-left[ Predictions for January 2022 <img src="figures/cdc_covid_dec21.png" width="100%" /> ] .pull-right[ Actual values for Jan 2022 <img src="figures/cdc_covid_jan22.png" width="100%" /> ] Unpredictable = bad use of modeling for storytelling .footnote[ _https://covid19forecasthub.org/reports/single_page.html?state=US&week=2022-01-04_ ] --- ## Example 3: Covid-19 forecasts Think of alternative questions: not "what's going to happen next?" but "what's happening now?" - Who isn't vaccinated? Where are hospitals overrun? What counties are experiencing worse outbreaks? <img src="figures/538_covid.png" width="80%" /> --- class: center, middle, inverse # Three principles --- .pull-left[ ## #1. Explore, but don't blindly follow the data ] .pull-right[ <img src="figures/data_detective.jpg" width="100%" /> ] --- .pull-left[ ## #2. Visualization *can* be the whole story! ] .pull-right[ <img src="figures/data_viz.jpg" width="100%" /> ] --- .pull-left[ ## #3. Big data is not a silver bullet. Don't ask questions the data cannot answer ] .pull-right[ <img src="figures/framers.jpg" width="100%" /> ] --- class: center ## Three principles ### 1. Explore, but don't blindly follow the data ### 2. Visualization *can* be the whole story! ### 3. Big data is not a silver bullet. Don't ask questions the data cannot answer. If you keep these three things in mind you will be a successful data analyst/journalist/scientist/whatever. --- # Thank you! ### Questions? .footnote[ **Website: [gelliottmorris.com](https://www.gelliottmorris.com)** **Twitter: [@gelliottmorris](http://www.twitter.com/gelliottmorris)** _These slides were made using the `xaringan` package for R. They are available online at https://www.gelliottmorris.com/slides/_ ]