Regression

MATH 4720/MSSC 5720 Introduction to Statistics

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

Regression

What is Regression

Regression models the relationship between one or more numerical/categorical response variables \((Y)\) and one or more numerical/categorical explanatory variables \((X)\).
A regression function \(f(X)\) describes how a response variable \(Y\), on average, changes as an explanatory variable \(X\) changes.

Examples:

college GPA \((Y)\) vs. ACT/SAT score \((X)\)
sales \((Y)\) vs. advertising expenditure \((X)\)
crime rate \((Y)\) vs. median income level \((X)\)

Unknown Regression Function

The true relationship between \(X\) and the mean of \(Y\), the regression function \(f(X)\), is unknown.
The collected data \((x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\) is all we know and have.
Goal: estimate \(f(X)\) from our data and use it to predict value of \(Y\) given any value of \(X\).

Simple Linear Regression

Start with simple linear regression:
- Only one predictor \(X\) (known and constant) and one response variable \(Y\)
- the regression function used for predicting \(Y\) is a linear function.
- use a regression line in a X-Y plane to predict the value of \(Y\) for a given value of \(X = x\).

Math review: A linear function \(y = f(x) = \beta_0 + \beta_1 x\) represents a straight line

\(\beta_1\): slope, the amount by which \(y\) changes when \(x\) increases by one unit.
\(\beta_0\): intercept, the value of \(y\) when \(x = 0\).
The linearity assumption: \(\beta_1\) does not change as \(x\) changes.

Sample Data: Relationship Between X and Y

Real data \((x_i, y_i), i = 1, 2, \dots, n\) do not form a perfect straight line!
\(y_i = \beta_0+\beta_1x_i + \color{red}{\epsilon_i}\)

When we collect our data, at any given level of \(X = x\), \(y\) is assumed being drawn from a normal distribution (for inference purpose).
Its value varies around and will not be exactly equal to its mean \(\mu_y\).

When we collect our data, at any given level of \(X = x\), \(y\) is assumed being drawn from a normal distribution (for inference purpose).
Its value varies around and will not be exactly equal to its mean \(\mu_y\).

The mean of \(Y\) and \(X\) form a straight line.

Simple Linear Regression Model (Population)

For the \(i\)-th measurement in the target population,

\[Y_i = \beta_0 + \beta_1X_i + \epsilon_i\]

\(Y_i\): the \(i\)-th value of the response (random) variable.
\(X_i\): the \(i\)-th known fixed value of the predictor.
\(\epsilon_i\): the \(i\)-th random error with assumption \(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2)\).
\(\beta_0\) and \(\beta_1\) are model coefficients.
\(\beta_0\), \(\beta_1\) and \(\sigma^2\) are fixed unknown parameters to be estimated from the sample data once we collect them.

Important Features of Model \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\)

\(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2)\)

\[\begin{align*} \mu_{Y_i \mid X_i} = \beta_0 + \beta_1X_i \end{align*}\]

The mean response \(\mu_{Y\mid X}\) has a straight-line relationship with \(X\) given by a population regression line \[\mu_{Y\mid X} = \beta_0 + \beta_1X\]

Important Features of Model \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\)

\(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2)\) \[\begin{align*} \mathrm{Var}(Y_i \mid X_i) &= \mathrm{Var}(\epsilon_i) = \sigma^2 \end{align*}\] The variance of \(Y\) does not depend on \(X\).

Important Features of Model \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\)

\(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2)\) \[\begin{align*} Y_i \mid X_i \stackrel{indep}{\sim} N(\beta_0 + \beta_1X_i, \sigma^2) \end{align*}\] For any fixed value of \(X_i = x_i\), the response \(Y_i\) varies with \(N(\mu_{Y_i\mid x_i}, \sigma^2)\).

Job: Collect data and estimate the unknown parameters \(\beta_0\), \(\beta_1\) and \(\sigma^2\)!

Idea of Fitting

Interested in \(\beta_0\) and \(\beta_1\) in the following sample regression model: \[\begin{align*} y_i = \beta_0 + \beta_1~x_{i} + \epsilon_i, \end{align*}\]

Use sample statistics \(b_0\) and \(b_1\) computed from our sample data to estimate \(\beta_0\) and \(\beta_1\).
\(\hat{y}_{i} = b_0 + b_1~x_{i}\) is called fitted value of \(y_i\), a point estimate of the mean \(\mu_{y|x_i}\) and \(y_i\) itself.

Fitting a Regression Line \(\hat{Y} = b_0 + b_1X\)

Given the sample data \(\{ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\},\)

Which sample regression line is the best?
What are the best estimators \(b_0\) and \(b_1\) for \(\beta_0\) and \(\beta_1\)?

Fitting a Regression Line \(\hat{Y} = b_0 + b_1X\)

Given the sample data \(\{ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)\},\)

Which sample regression line is the best?
What are the best estimators \(b_0\) and \(b_1\) for \(\beta_0\) and \(\beta_1\)?

What does “best” mean? Ordinary Least Squares (OLS)

Choose \(b_0\) and \(b_1\), or sample regression line \(b_0 + b_1x\) that minimizes the sum of squared residuals \(SS_{res}\)

\[SS_{res} = e_1^2 + e_2^2 + \dots + e_n^2 = \sum_{i = 1}^n e_i^2,\] where the residual \(e_i = y_i - \hat{y}_i = y_i - (b_0 + b_1x_i)\).

If \(b_0\) and \(b_1\) are the best estimators,

\[\small{\begin{align} SS_{res} &= (y_1 - b_0 - b_1x_1)^2 + (y_2 - b_0 - b_1x_2)^2 + \dots + (y_n - b_0 - b_1x_n)^2\\ &= \sum_{i=1}^n(y_i - b_0 - b_1x_i)^2 \end{align}}\] that is the smallest comparing to any other \(SS_{res} = \sum_{i=1}^n(y_i - a_0 - a_1x_i)^2\) that uses another pair of values \((a_0, a_1) \ne (b_0, b_1)\).

Visualizing Residuals

Visualizing Residuals (cont.)

Least Squares Estimates (LSE)

The least squares approach choose \(b_0\) and \(b_1\) to minimize the \(SS_{res}\), i.e.,

\[(b_0, b_1) = \arg \min_{\alpha_0, \alpha_1} \sum_{i=1}^n(y_i - \alpha_0 - \alpha_1x_i)^2\]

MATH 1450/1455 …

\[\color{red}{b_0 = \overline{y} - b_1\overline{x}}\]

\[\color{red}{b_1 = \frac{\sum_{i=1}^n(x_i - \overline{x})(y_i - \overline{y})}{\sum_{i=1}^n(x_i - \overline{x})^2} = r \frac{\sqrt{S_{yy}}}{\sqrt{S_{xx}}}},\] where \(S_{xx} = \sum_{i=1}^n(x_i - \overline{x})^2\) and \(S_{yy} = \sum_{i=1}^n(y_i - \overline{y})^2\)

What can we learn from the formula of \(b_0\) and \(b_1\)?

R Lab: `mpg` Data

library(ggplot2)  ## use data mpg in ggplot2 package
mpg

# A tibble: 234 × 11
   manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
   <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
 1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
 2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
 3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
 4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
 5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
 6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
 7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
 8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
 9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
# ℹ 224 more rows

R Lab: Highway MPG `hwy` vs. Displacement `displ`

plot(x = mpg$displ, y = mpg$hwy,
     las = 1, pch = 19, col = "navy", cex = 0.5,
     xlab = "Displacement (litres)", ylab = "Highway MPG",
     main = "Highway MPG vs. Engine Displacement (litres)")

R Lab: Fit Simple Linear Regression

reg_fit <- lm(formula = hwy ~ displ, 
              data = mpg)
reg_fit


Call:
lm(formula = hwy ~ displ, data = mpg)

Coefficients:
(Intercept)        displ  
      35.70        -3.53

typeof(reg_fit)

[1] "list"

## all elements in reg_fit
names(reg_fit)

 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"

## use $ to extract an element of a list
reg_fit$coefficients

(Intercept)       displ 
      35.70       -3.53

\(\widehat{hwy}_{i} = b_0 + b_1 \times displ_{i} = 35.7 - 3.5 \times displ_{i}\)
\(b_1\): For one unit (litre) increase of the displacement, we expect the highway MPG to be decreased, on average, by 3.5.

R Lab Fitted Values of \(y\)

## the first 5 observed response value y
mpg$hwy[1:5]

[1] 29 29 31 30 26

## the first 5 fitted value y_hat
head(reg_fit$fitted.values, 5)

   1    2    3    4    5 
29.3 29.3 28.6 28.6 25.8

## the first 5 predictor value x
mpg$displ[1:5]

[1] 1.8 1.8 2.0 2.0 2.8

length(reg_fit$fitted.values)

[1] 234

R Lab Add a Regression Line

plot(x = mpg$displ, y = mpg$hwy, las = 1, pch = 19, col = "navy", cex = 0.5,
     xlab = "Displacement (litres)", ylab = "Highway MPG",
     main = "Highway MPG vs. Engine Displacement (litres)")
abline(reg_fit, col = "#FFCC00", lwd = 3)

Estimation for \(\sigma^2\)

Think of \(\sigma^2\) as variance around the line or the mean square (prediction) error.
The estimate of \(\sigma^2\) is the mean square residual \(MS_{res}\):

\[\hat{\sigma}^2 = MS_{res} = \frac{SS_{res}}{n-2} = \frac{\sum_{i=1}^n(y_i - \hat{y}_i)^2}{n-2}\]

\(MS_{res}\) is often shown in computer output as \(\texttt{MS(Error)}\) or \(\texttt{MS(Residual)}\).

R Lab Standard Error of Regression

(summ_reg_fit <- summary(reg_fit))


Call:
lm(formula = hwy ~ displ, data = mpg)

Residuals:
   Min     1Q Median     3Q    Max 
-7.104 -2.165 -0.224  2.059 15.010 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   35.698      0.720    49.5   <2e-16 ***
displ         -3.531      0.195   -18.1   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.84 on 232 degrees of freedom
Multiple R-squared:  0.587, Adjusted R-squared:  0.585 
F-statistic:  329 on 1 and 232 DF,  p-value: <2e-16

R Lab Standard Error of Regression

# lots of fitted information saved in summary(reg_fit)!
names(summ_reg_fit)

 [1] "call"          "terms"         "residuals"     "coefficients" 
 [5] "aliased"       "sigma"         "df"            "r.squared"    
 [9] "adj.r.squared" "fstatistic"    "cov.unscaled"

# residual standard error (sigma_hat)
summ_reg_fit$sigma

[1] 3.84

# from reg_fit
sqrt(sum(reg_fit$residuals^2) / reg_fit$df.residual)

[1] 3.84

Confidence Interval for \(\beta_1\)

\(\frac{b_1 - \beta_1}{\sqrt{\hat{\sigma}^2/S_{xx}}} \sim t_{n-2}\)

\((1-\alpha)100\%\) CI for \(\beta_1\) is \(b_1 \pm t_{\alpha/2, n-2}\sqrt{\hat{\sigma}^2/S_{xx}}\)

confint(reg_fit, level = 0.95)

            2.5 % 97.5 %
(Intercept) 34.28  37.12
displ       -3.91  -3.15

Hypothesis Testing: \(\beta_1\)

\(H_0: \beta_1 = \beta_1^0 \quad H_1: \beta_1 \ne \beta_1^0\)
Test statistic: Under \(H_0\),

\[t_{test} = \frac{b_1 - \color{red}{\beta_1^0}}{\sqrt{\frac{\hat{\sigma}^2}{S_{xx}}}} \sim t_{n-2}\]

Reject \(H_0\) in favor of \(H_1\) if
- \(|t_{test}| > t_{\alpha/2, \, n-2}\)
- \(\text{p-value} = 2P(t_{n-2} > |t_{test}|) < \alpha\)

R Lab Testing on \(\beta_0\) and \(\beta_1\)

summ_reg_fit$coefficients

            Estimate Std. Error t value  Pr(>|t|)
(Intercept)    35.70      0.720    49.6 2.12e-125
displ          -3.53      0.195   -18.2  2.04e-46

Testing \(H_0: \beta_0 = 0\) and \(H_0: \beta_1 = 0\)

Interpretation of Testing Results

\(H_0: \beta_1 = 0 \quad H_1: \beta_1 \ne 0\)
Failing to reject \(H_0: \beta_1 = 0\) implies there is no linear relationship between \(Y\) and \(X\).

If we reject \(H_0: \beta_1 = 0\), does it mean \(X\) and \(Y\) are linearly related?

Test of Significance of Regression

Rejecting \(H_0: \beta_1 = 0\) could mean
- the straight-line model is adequate
- better results could be obtained with a more complicated model

Analysis of Variance (ANOVA) Approach

\(X\) - \(Y\) Relationship Explains Some Deviation

Suppose we only have data \(Y\) and have no information about \(X\) and relationship between \(X\) and \(Y\). How do we predict a value of \(Y\)?

Our best guess would be \(\overline{y}\) if the data have no pattern, i.e., \(\hat{y}_i = \overline{y}\).
Treat \(X\) and \(Y\) as uncorrelated.
The (total) deviation from the mean is \((y_i - \overline{y})\)

If \(X\) and \(Y\) are linearly related, fitting a linear regression model helps us predict the value of \(Y\) when the value of \(X\) is provided.
\(\hat{y}_i = b_0 + b_1x_i\) is closer to \(y_i\) than \(\overline{y}\).
The regression model explains some deviation of \(y\).

Partition of Deviation

Total deviation = Deviation explained by regression + Unexplained deviation
\((y_i - \overline{y}) = (\hat{y}_i - \overline{y}) + (y_i - \hat{y}_i)\)
\((19 - 9) = (13 - 9) + (19 - 13)\)

Sum of Squares (SS)

\(\sum_{i=1}^n(y_i - \overline{y})^2 = \sum_{i=1}^n(\hat{y}_i - \overline{y})^2 + \sum_{i=1}^n(y_i - \hat{y}_i)^2\)
Total SS \((SS_T)\) = Regression SS \((SS_R)\) + Residual SS \((SS_{res})\)
\(df_T = df_R + df_{res}\)
\(\color{blue}{(n-1) = 1 +(n-2)}\)

Total variability = variability explained by regression + unexplained variability
\(\sum_{i=1}^n(y_i - \overline{y})^2 = \sum_{i=1}^n(\hat{y}_i - \overline{y})^2 + \sum_{i=1}^n(y_i - \hat{y}_i)^2 + 2\sum_{i=1}^n(\hat{y}_i - \overline{y})(y_i - \hat{y}_i)\)
\(df_T = n - 1\): lose 1 df with constraint \(\sum_{i=1}^n(y_i - \overline{y}) = 0\)
\(df_R = 1\): all \(\hat{y}_i\) are on the regression line with 2 dfs (intercept and slope), but with constraint \(\sum_{i=1}^n(\hat{y}_i - \overline{y}) = 0\)
\(df_{res} = n - 2\): lose 2 dfs because \(\beta_0\) and \(\beta_1\) are estimated by \(b_0\) and \(b_1\), which are linear combo of \(y_i\)
degrees of freedom is the equivalent number of values in the calculation of a statistic that are free to vary.
\(SS_R = b_1S_{xy}\) or \(SS_{res} = SS_T - b_1S_{xy}\).

ANOVA for Testing Significance of Regression

A larger value of \(F_{test}\) indicates that regression is significant.
Reject \(H_0\) if
- \(F_{test} > F_{\alpha, 1, n-2}\)
- \(\text{p-value} = P(F_{1, n-2} > F_{test}) < \alpha\).
The ANOVA is designed to test \(H_0\) that all predictors have no value in predicting \(y\).
In SLR, the \(F\)-test of ANOVA gives the same result as a two-sided \(t\)-test of \(H_0: \beta_1=0\).

ANOVA is used for testing significance of regression.
It is testing if any predictors or regressors have explanatory power for predicting y.
In other words, it is testing if the whole regression model is useful or not.
Here is the ANOVA table.
\(H_0: \beta_1 = 0\)
A larger value of \(F_{test}\) indicates that regression is significant.
Reject \(H_0\) in favor of \(H_1\) if \(F_{test} > F_{\alpha, 1, n-2}\) or \(\text{p-value} = P(F_{1, n-2} > F_{test}) < \alpha\).
The ANOVA is designed to test \(H_0\) that all predictors have no value in predicting \(y\).
In SLR, there is only one predictor, and hence the \(F\)-test of ANOVA gives the same result as a two-sided \(t\)-test of \(H_0: \beta_1=0\).

If we have \(k \ge 2\) predictors, the \(F\)-test is testing \(H_0: \beta_1=\beta_2=\cdots=\beta_k=0\).

R Lab ANOVA Table

anova(reg_fit)

Analysis of Variance Table

Response: hwy
           Df Sum Sq Mean Sq F value Pr(>F)    
displ       1   4848    4848     329 <2e-16 ***
Residuals 232   3414      15                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For \(H_0: \beta_1 = 0\) in SLR, \(t_{test}^2 = F_{test}\).

            Estimate Std. Error t value  Pr(>|t|)
(Intercept)    35.70      0.720    49.6 2.12e-125
displ          -3.53      0.195   -18.2  2.04e-46

[1] 329

For \(H_0: \beta_1 = 0\), \(t_{test}^2 = \left(\frac{b_1}{\sqrt{MS_{res}/S_{xx}}}\right)^2 = \frac{b_1^2S_{xx}}{MS_{res}} = \frac{b_1S_{xy}}{MS_{res}} = \frac{MS_R}{MS_{res}} = F_{test}\)

Coefficient of Determination

The coefficient of determination \((R^2)\) is the proportion of the variation in \(y\) that is explained by the regression model:

\[R^2 = \frac{SS_R}{SS_T} =\frac{SS_T - SS_{res}}{SS_T} = 1 - \frac{SS_{res}}{SS_T}\]

\(R^2\) as the proportionate reduction of total variation associated with the use of \(X\).
(a) \(\hat{y}_i = y_i\) and \(\small SS_{res} = \sum_{i=1}^n(y_i - \hat{y}_i)^2 = 0\). (b) \(\hat{y}_i = \overline{y}\) and \(\small SS_R = \sum_{i=1}^n(\hat{y}_i - \overline{y})^2 = 0\).

R Lab \(R^2\)

summ_reg_fit


Call:
lm(formula = hwy ~ displ, data = mpg)

Residuals:
   Min     1Q Median     3Q    Max 
-7.104 -2.165 -0.224  2.059 15.010 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   35.698      0.720    49.5   <2e-16 ***
displ         -3.531      0.195   -18.1   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.84 on 232 degrees of freedom
Multiple R-squared:  0.587, Adjusted R-squared:  0.585 
F-statistic:  329 on 1 and 232 DF,  p-value: <2e-16

summ_reg_fit$r.squared

[1] 0.587

Prediction

Predicting the Mean Response

With predictor value \(x = x_0\), we want to estimate the mean response \(\mu_{y|x_0} = \beta_0 + \beta_1 x_0\).
- The mean highway MPG \(\mu_{y|x_0}\) when displacement is \(x = x_0 = 5.5\).

If \(x_0\) is within the range of \(x\), an unbiased point estimator for \(\mu_{y|x_0}\) is

\[\hat{\mu}_{y | x_0} = b_0 + b_1 x_0\]

The \((1-\alpha)100\%\) CI for \(\mu_{y|x_0}\) is

\[\hat{\mu}_{y | x_0} \pm t_{\alpha/2, n-2} \hat{\sigma}\sqrt{\frac{1}{n} + \frac{(x_0 - \overline{x})^2}{S_{xx}}}\]

Does the length of the CI for \(\mu_{y|x_0}\) stay the same at any location of \(x_0\)?

## CI for the mean response
predict(reg_fit, newdata = data.frame(displ = 5.5), interval = "confidence", level = 0.95)

   fit  lwr  upr
1 16.3 15.4 17.2

Predicting New Observations

Predict the value of a new observation \(y_0\) with \(x = x_0\).
- The highway MPG of a car \(y_0(x_0)\) when its displacement is \(x = x_0 = 5.5\).

An unbiased point estimator for \(y_0(x_0)\) is

\[\hat{y}_0(x_0) = b_0 + b_1 x_0\]

The \((1-\alpha)100\%\) prediction interval (PI) for \(y_0(x_0)\) is

\[\hat{y}_0 \pm t_{\alpha/2, n-2} \hat{\sigma}\sqrt{{\color{red}{1}}+ \frac{1}{n} + \frac{(x_0 - \overline{x})^2}{S_{xx}}}\]

What is the difference between CI for \(\mu_{y | x_0}\) and PI for \(y_0(x_0)\)?

The PI is wider as it includes the uncertainty about \(b_0\), \(b_1\) as well as \(y_0\) due to error \(\epsilon\).

R Lab Prediction

## CI for the mean response
predict(reg_fit, newdata = data.frame(displ = 5.5), interval = "confidence", level = 0.95)

   fit  lwr  upr
1 16.3 15.4 17.2

## PI for the new observation
predict(reg_fit, newdata = data.frame(displ = 5.5), interval = "predict", level = 0.95)

   fit  lwr  upr
1 16.3 8.67 23.9

Regression

Regression

What is Regression

Unknown Regression Function

Simple Linear Regression

Sample Data: Relationship Between X and Y

Simple Linear Regression Model (Population)

Important Features of Model \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\)

Important Features of Model \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\)

Important Features of Model \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i\)

Idea of Fitting

Fitting a Regression Line \(\hat{Y} = b_0 + b_1X\)

Fitting a Regression Line \(\hat{Y} = b_0 + b_1X\)

What does “best” mean? Ordinary Least Squares (OLS)

Visualizing Residuals

Visualizing Residuals (cont.)

Visualizing Residuals (cont.)

Least Squares Estimates (LSE)

R Lab: mpg Data

R Lab: Highway MPG hwy vs. Displacement displ

R Lab: Fit Simple Linear Regression

R Lab Fitted Values of \(y\)

R Lab Add a Regression Line

Estimation for \(\sigma^2\)

R Lab Standard Error of Regression

R Lab Standard Error of Regression

Confidence Interval for \(\beta_1\)

Hypothesis Testing: \(\beta_1\)

R Lab Testing on \(\beta_0\) and \(\beta_1\)

Interpretation of Testing Results

Test of Significance of Regression

Analysis of Variance (ANOVA) Approach

\(X\) - \(Y\) Relationship Explains Some Deviation

Partition of Deviation

Sum of Squares (SS)

ANOVA for Testing Significance of Regression

R Lab ANOVA Table

Coefficient of Determination

R Lab \(R^2\)

Prediction

Predicting the Mean Response

Predicting New Observations

R Lab Prediction

R Lab: `mpg` Data

R Lab: Highway MPG `hwy` vs. Displacement `displ`