Lecture 2
Duke University
STA 199 - Fall 2022
9/1/22
I want to make sure that you learn everything you were hoping to learn from this class. If this requires flexibility, please don’t hesitate to ask.
Q - What data science background does this course assume?
A - None.
Q - Is this an intro stat course?
A - While statistics \(\ne\) data science, they are very closely related and have tremendous of overlap. Hence, this course is a great way to get started with statistics. However this course is not your typical high school statistics course.
Q - Will we be doing computing?
A - Yes.
Q - Is this an intro CS course?
A - No, but many themes are shared.
Q - What computing language will we learn?
A - R.
Q: Why not language X?
A: We can discuss that over ☕.
Course operation
Doing data science
By the end of the course, you will be able to…
What does it mean for a data analysis to be “reproducible”?
Near-term goals:
Long-term goals:
Packages: Fundamental units of reproducible R code, including reusable R functions, the documentation that describes how to use them, and sample data1
As of September 2022, there are over 18,000 R packages available on CRAN (the Comprehensive R Archive Network)2
We’re going to work with a small (but important) subset of these!
$
:Remember this, and expect it to bite you a few times as you’re learning to work with Quarto!
ae-0-bechdel-quarto
ae-0-bechdel-quarto
to your container.bechdel.qmd
, review the document, and fill in the blanks.GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better
We will use GitHub as a platform for web hosting and collaboration (and as our course management system!)