library(tidyverse)
AE 08: Data science ethics - Misrepresentation
Suggested answers
Packages
Part 1 - People’s Poll
GB News tweeted the following on Aug 26, 2022.
- Question: What is wrong with the visualization above?
The bars are not drawn to scale, hence are misleading
Your turn (5 minutes): The data from this poll are at
data/gbpoll.csv
. First, load the data and confirm the number of responses match those mentioned in the tweet.<- read_csv("data/gbpoll.csv") gbpoll
Rows: 1235 Columns: 1 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): party ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nrow(gbpoll)
[1] 1235
Then, confirm that the proportions of intended votes match those mentioned in the tweet.
|> gbpoll count(party) |> mutate(prop = n / sum(n))
# A tibble: 5 × 3 party n prop <chr> <int> <dbl> 1 Conservative 321 0.260 2 Labour 494 0.4 3 Liberal Democrats 136 0.110 4 Other 210 0.170 5 SNP 74 0.0599
Demo: Recreate the visualization from the tweet. You do not need to worry about matching the colors precisely and your bars should be correctly scaled.
|> gbpoll count(party) |> mutate( prop = round(n / sum(n), 2), prop_char = paste0(prop*100, "%") |> ) filter(party != "Other") |> mutate(party = fct_reorder(party, prop)) |> ggplot( aes( y = party, x = prop, fill = party )+ ) geom_col(show.legend = FALSE) + scale_y_discrete(drop = TRUE) + theme_void() + geom_text( aes(label = str_wrap(party, 10)), x = 0.005, hjust = 0, color = "white", fontface = "bold" + ) geom_text( aes(label = prop_char), nudge_x = -0.005, hjust = 1, color = "white", fontface = "bold" + ) scale_fill_manual(values = c("#D0C123", "#F99C1A","#00AEEF", "#E4013B")) + labs( title = str_to_upper("If there were to be a general election tomorrow,\nwhich party would you vote for?") )
Your turn (10 minutes): Improve the visualization. State the improvements you made and why you made them. Discuss how these improvements help make the plot less misleading.
|> gbpoll count(party) |> mutate( prop = round(n / sum(n), 2), prop_char = paste0(prop*100, "%") |> ) mutate(party = fct_reorder(party, prop)) |> ggplot( aes( y = party, x = prop, fill = party )+ ) geom_col(show.legend = FALSE) + scale_y_discrete(drop = TRUE) + theme_void() + geom_text( aes(label = str_wrap(party, 10)), x = 0.005, hjust = 0, color = "white", fontface = "bold" + ) geom_text( aes(label = prop_char), nudge_x = -0.005, hjust = 1, color = "white", fontface = "bold" + ) scale_fill_viridis_d(option = "E", end = 0.8) + labs( title = "If there were to be a general election tomorrow,\nwhich party would you vote for?", caption = "Source: GBNews Poll, Aug 26, 2022" )
Part 2 - Private sector
The following chart was shared by @GraphCrimes on Twitter on September 3, 2022.
- Question: What is misleading about this graph?
- Your turn (6 minutes): If you needed to recreate this plot, with improvements to avoid its misleading pitfalls, what data do you need? How many variables? How many observations? Can you find the data online? Try looking for it for at least 3 minutes with a partner.
- Demo: Load the data for this survey from
data/survation.csv
. First confirm that the data match the percentages from the visualization. Then, recreate the visualization, and improve it. Does the improved visualization look different than the original? Does it send a different message at a first glance?