AE 02: Visualizing penguins

Application exercise
Important

Go to the course GitHub organization and locate the repo titled ae-02-YOUR_GITHUB_USERNAME to get started.

This AE is due Sunday, Sep 11 at 11:59pm.

For all analyses, we’ll use the tidyverse and palmerpenguins packages.

library(tidyverse)
library(palmerpenguins)

The dataset we will visualize is called penguins. Let’s glimpse() at it.

# add code here

Visualizing penguin weights - Demo

Single variable

Note

Analyzing the a single variable is called univariate analysis.

Create visualizations of the distribution of weights of penguins.

  1. Make a histogram. Set an appropriate binwidth.
# add code here
  1. Make a boxplot.
# add code here
  1. Based on these, determine if each of the following statements about the shape of the distribution is true or false.
    • The distribution of penguin weights in this sample is left skewed. FALSE
    • The distribution of penguin weights in this sample is unimodal. TRUE

Two variables

Note

Analyzing the relationship between two variables is called bivariate analysis.

Create visualizations of the distribution of weights of penguins by species.

  1. Make a single histogram. Set an appropriate binwidth.
# add code here
  1. Use multiple histograms via faceting, one for each species. Set an appropriate binwidth, add color as you see fit, and turn off legends if not needed.
# add code here
  1. Use side-by-side box plots. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Use density plots. Add color as you see fit.
# add code here
  1. Use violin plots. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Make a jittered scatter plot. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Use beeswarm plots. Add color as you see fit and turn off legends if not needed.
# add code here
  1. Use multiple geoms on a single plot. Be deliberate about the order of plotting. Change the theme and the color scale of the plot. Finally, add informative labels.
# add code here

Multiple variables

Note

Analyzing the relationship between three or more variables is called multivariate analysis.

  1. Facet the plot you created in the previous exercise by island. Adjust labels accordingly.
# add code here

Before you continue, let’s turn off all warnings the code chunks generate and resize all figures. We’ll do this by editing the YAML.

Visualizing other variables - Your turn!

  1. Pick a single categorical variable from the data set and make a bar plot of its distribution.
# add code here
  1. Pick two categorical variables and make a visualization to visualize the relationship between the two variables. Along with your code and output, provide an interpretation of the visualization.
# add code here

Interpretation goes here…

  1. Make another plot that uses at least three variables. At least one should be numeric and at least one categorical. In 1-2 sentences, describe what the plot shows about the relationships between the variables you plotted. Don’t forget to label your code chunk.
# add code here

Interpretation goes here…