Grammar of data wrangling

Lecture 5

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2022

9/13/22

Warm up

While you wait for class to begin…

  • Clone your ae-03 repo.
  • Any questions from prepare materials? Go to slido.com / #sta199. You can also upvote others’ questions.

Announcements

  • Request videos for missed classes via the video request form
  • Ask course questions on Slack
    • Using code formatting and proper indentation

    • Taking screenshots

    • Checking for previous questions

Questions from last time

  • How come I had to load my packages each session before basic functions like ggplot() become available?
  • Will we spend time learning more of the actual fundamental statistics needed to understand how the different graphs work and which ones are most useful for different data sets?

Coding style + workflow

  • Avoid long lines of code.

    • We should be able to see all of your code in the PDF document you submit.
  • Label code chunks.

    • Do not put spaces in the code-chunk labels.
  • Render, commit, and push regularly.

    • Think about it like clicking to save regularly as you type a report.

Application exercise

ae-03

  • Go to the course GitHub org and find your ae-03 (repo name will be suffixed with your GitHub name).
  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline (3 days from today).

Recap of AE

  • The pipe operator, |>, can be read as “and then”.
  • The pipe operator passes what comes before it into the function that comes after it as the first argument in that function.
sum(1, 2)
[1] 3
1 |> 
  sum(2)
[1] 3
  • Always use a line break after the pipe, and indent the next line of code.
    • Just like always use a line break between layers of ggplots, after +, and indent the next line.