Lecture 16
Duke University
STA 199 - Fall 2022
10/25/22
Clone your ae-13
project from GitHub, render your document, update your name, and commit and push.
critics
and audience
movie_scores
A regression model is a function that describes the relationship between the outcome, \(Y\), and the predictor, \(X\).
\[\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]
\[ \begin{aligned} Y &= \color{#325b74}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{#325b74}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{#325b74}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned} \]
Use simple linear regression to model the relationthip between a quantitative outcome (\(Y\)) and a single quantitative predictor (\(X\)): \[\Large{Y = \beta_0 + \beta_1 X + \epsilon}\]
\[\Large{\hat{Y} = b_0 + b_1 X}\]
\[\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}\]
\[e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i\]
\[e^2_1 + e^2_2 + \dots + e^2_n\]
The regression line goes through the center of mass point (the coordinates corresponding to average \(X\) and average \(Y\)): \(b_0 = \bar{Y} - b_1~\bar{X}\)
Slope has the same sign as the correlation coefficient: \(b_1 = r \frac{s_Y}{s_X}\)
Sum of the residuals is zero: \(\sum_{i = 1}^n \epsilon_i = 0\)
Residuals and \(X\) values are uncorrelated
slido.com / #sta199
Poll: The slope of the model for predicting audience score from critics score is 0.519. Which of the following is the best interpretation of this value?
\[\widehat{\text{audience}} = 32.3 + 0.519 \times \text{critics}\]
✅ The intercept is meaningful in context of the data if
🛑 Otherwise, it might not be meaningful!
ae-13
ae-13
(repo name will be suffixed with your GitHub name).