Data types and classes

Lecture 8

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2022

9/22/22

Warm up

While you wait for class to begin…

  • Open your ae-05 project (from last time) in RStudio, render your document, and commit and push. Make sure you have your “first draft” plot ready to go.
  • Any questions from prepare materials? Go to slido.com / #sta199. You can also upvote others’ questions.

Announcements

  • HW 2 due tonight (11:59 pm)

  • Lab 2 due tomorrow night (11:59 pm)

Review: Logical operators

x < y test if x less than y
x <= y test if x less than or equal to y
x > y test if x greater than y
x >= y test if x greater than or equal to y
x == y test if x is equal to y
x != y test if x is not equal to y
is.na(x) test if x is NA
!is.na(x) test if x is not NA
x %in% y test if x is in y
!(x %in% y) test if x is not in y
!x test for not x
x & y test for x and y
x | y test for x or y

Question from last time

  • What is the difference between is.na() and na.rm?

is.na() checks for NAs and returns TRUE or FALSE – it’s a function.

x <- c(1, 2, NA)
is.na(x)
[1] FALSE FALSE  TRUE

na.rm removes NAs before applying a function – it’s an argument in a function.

mean(x)
[1] NA
mean(x, na.rm = TRUE)
[1] 1.5

From last time

Continue from last time: ae-05

  • Go to your container and open your ae-05 project.
  • Render, commit, and push before getting started again.

Recap of AE

  • Data sets can’t be labeled as wide or long but they can be made wider or longer for a certain analysis that requires a certain format
  • When pivoting longer, variable names that turn into values are characters by default. If you need them to be in another format, you need to explicitly make that transformation, which you can do so within the pivot_longer() function.
  • You can tweak a plot forever, but at some point the tweaks are likely not very productive. However, you should always be critical of defaults (however pretty they might be) and see if you can improve the plot to better portray your data / results / what you want to communicate.

Types and classes

Types and classes

  • Type is how an object is stored in memory, e.g.,

    • double: a real number stored in double-precision floatint point format.
    • integer: an integer (positive or negative)

– Class is metadata about the object that can determine how common functions operate on that object, e.g.,

  • factor

Types of vectors

You’ll commonly encounter:

  • logical
  • integer
  • double
  • character

You’ll less commonly encounter:

  • list
  • NULL
  • complex
  • raw

Types of functions

Yes, functions have types too, but you don’t need to worry about the differences in the context of doing data science.

typeof(mean) # regular function
[1] "closure"
typeof(`$`) # internal function
[1] "special"
typeof(sum) # primitive function
[1] "builtin"

Factors

A factor is a vector that can contain only predefined values. It is used to store categorical data.

x <- factor(c("a", "b", "b", "a"))
x
[1] a b b a
Levels: a b
typeof(x)
[1] "integer"
attributes(x)
$levels
[1] "a" "b"

$class
[1] "factor"

Other classes

Just a couple of examples…

Date:

today <- Sys.Date()
today
[1] "2022-11-09"
typeof(today)
[1] "double"
attributes(today)
$class
[1] "Date"

Date-time:

now <- as.POSIXct("2022-09-22 10:15", tz = "EST")
now
[1] "2022-09-22 10:15:00 EST"
typeof(now)
[1] "double"
attributes(now)
$class
[1] "POSIXct" "POSIXt" 

$tzone
[1] "EST"

Application exercise

  • Go to the course GitHub org and find your ae-06 (repo name will be suffixed with your GitHub name).
  • Clone the repo in your container, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – 3 days from today.