NMSA407 Linear Regression: Tutorial

Use of the lm function

Dataset

Everything in this tutorial will be exemplified on pseudodata that we now create.

y <- c(-9, -11, 1, 19)
x <- c(-3, -1, 1, 3)
x2 <- x^2
Data <- data.frame(y = y, x = x, x2 = x2)
print(Data)

##     y  x x2
## 1  -9 -3  9
## 2 -11 -1  1
## 3   1  1  1
## 4  19  3  9

Linear model and LSE using the `lm` function

Model

We will be fitting the following linear model: \[ \mathsf{E}Y = \beta_0 + \beta_1\,x + \beta_2\,x^2. \]

Least squares solution and calculation of related quantities

m <- lm(y ~ x + x2)

print(m)

## 
## Call:
## lm(formula = y ~ x + x2)
## 
## Coefficients:
## (Intercept)            x           x2  
##       -6.25         4.80         1.25

class(m)

## [1] "lm"

The R function lm uses the QR decomposition of the model matrix \(\mathbf{X}\) to calculate the LSE and related quantities which are now stored inside the object m.

Content of the object `m`

names(m)

##  [1] "coefficients"  "residuals"     "effects"       "rank"          "fitted.values" "assign"       
##  [7] "qr"            "df.residual"   "xlevels"       "call"          "terms"         "model"

Meaning of many elements of m should be quite clear from their names.

Estimated regression coefficients \(\widehat{\beta}\)

m$coefficients

## (Intercept)           x          x2 
##       -6.25        4.80        1.25

Residuals \(\mathbf{U}\)

m$residuals

##    1    2    3    4 
##  0.4 -1.2  1.2 -0.4

Rank \(r\) of the model

m$rank

## [1] 3

Fitted values \(\widehat{\mathbf{Y}}\)

m$fitted.values

##    1    2    3    4 
## -9.4 -9.8 -0.2 19.4

Residual degrees of freedom \(n - r\)

m$df.residual

## [1] 1

Components of `m` useful only in special circumstancies

Components of the object m are useful only in some special circumstancies. There is no need to explore them now in much detail.

m$assign

## [1] 0 1 2

m$xlevels

## named list()

m$call

## lm(formula = y ~ x + x2)

m$terms

## y ~ x + x2
## attr(,"variables")
## list(y, x, x2)
## attr(,"factors")
##    x x2
## y  0  0
## x  1  0
## x2 0  1
## attr(,"term.labels")
## [1] "x"  "x2"
## attr(,"order")
## [1] 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: 0xecf5b10>
## attr(,"predvars")
## list(y, x, x2)
## attr(,"dataClasses")
##         y         x        x2 
## "numeric" "numeric" "numeric"

Nevertheless, there are still some components of the m object that deserve a detailed exploration.

Response vector and vectors needed to create a model matrix

m$model

##     y  x x2
## 1  -9 -3  9
## 2 -11 -1  1
## 3   1  1  1
## 4  19  3  9

Note that in more complex models, the model matrix does not directly appear here.

QR decomposition of the model matrix \(\mathbf{X}\)

QR decomposition of the model matrix is stored in a compact form:

m$qr

## $qr
##   (Intercept)          x          x2
## 1        -2.0  0.0000000 -10.0000000
## 2         0.5 -4.4721360   0.0000000
## 3         0.5  0.4472136   8.0000000
## 4         0.5  0.8944272  -0.9296181
## attr(,"assign")
## [1] 0 1 2
## 
## $qraux
## [1] 1.500000 1.000000 1.368524
## 
## $pivot
## [1] 1 2 3
## 
## $tol
## [1] 1e-07
## 
## $rank
## [1] 3
## 
## attr(,"class")
## [1] "qr"

1. \(\mathbf{Q}\) matrix:

qr.Q(m$qr)

##      [,1]       [,2] [,3]
## [1,] -0.5  0.6708204  0.5
## [2,] -0.5  0.2236068 -0.5
## [3,] -0.5 -0.2236068 -0.5
## [4,] -0.5 -0.6708204  0.5

2. \(\mathbf{R}\) matrix:

qr.R(m$qr)

##   (Intercept)         x  x2
## 1          -2  0.000000 -10
## 2           0 -4.472136   0
## 3           0  0.000000   8

3. \(\mathbf{P}\) matrix (matrix \(\mathbf{Q}\) supplemented by additional columns to form an orthonormal basis of \(R^n\)):

qr.Q(m$qr, complete = TRUE)

##      [,1]       [,2] [,3]       [,4]
## [1,] -0.5  0.6708204  0.5  0.2236068
## [2,] -0.5  0.2236068 -0.5 -0.6708204
## [3,] -0.5 -0.2236068 -0.5  0.6708204
## [4,] -0.5 -0.6708204  0.5 -0.2236068

We keep all above matrices:

(Q <- qr.Q(m$qr))

##      [,1]       [,2] [,3]
## [1,] -0.5  0.6708204  0.5
## [2,] -0.5  0.2236068 -0.5
## [3,] -0.5 -0.2236068 -0.5
## [4,] -0.5 -0.6708204  0.5

(P <- qr.Q(m$qr, complete = TRUE))

##      [,1]       [,2] [,3]       [,4]
## [1,] -0.5  0.6708204  0.5  0.2236068
## [2,] -0.5  0.2236068 -0.5 -0.6708204
## [3,] -0.5 -0.2236068 -0.5  0.6708204
## [4,] -0.5 -0.6708204  0.5 -0.2236068

(N <- matrix(P[, (m$rank + 1):(m$rank + m$df.residual)], ncol = m$df.residual))

##            [,1]
## [1,]  0.2236068
## [2,] -0.6708204
## [3,]  0.6708204
## [4,] -0.2236068

(R <- qr.R(m$qr))

##   (Intercept)         x  x2
## 1          -2  0.000000 -10
## 2           0 -4.472136   0
## 3           0  0.000000   8

Component `effects`

m$effects

## (Intercept)           x          x2             
##    0.000000  -21.466253   10.000000    1.788854

Compare it with:

crossprod(P, y)

##            [,1]
## [1,]   0.000000
## [2,] -21.466253
## [3,]  10.000000
## [4,]   1.788854

It is seen that effects \(=\) \(\mathbf{P}^\top\mathbf{Y}\). This term is useful for several things as will be now illustrated.

1. Fitted values \(\widehat{\mathbf{Y}}\) again

The first \(r\) elements of effects equals to \(\mathbf{Q}^\top\mathbf{Y}\) and can be used to calculate easily the fitted values: \(\widehat{\mathbf{Y}} = \mathbf{H}\mathbf{Y} = \mathbf{Q}\mathbf{Q}^\top\mathbf{Y}\).

Q %*% m$effects[1:m$rank]

##      [,1]
## [1,] -9.4
## [2,] -9.8
## [3,] -0.2
## [4,] 19.4

m$fitted                             ## the same

##    1    2    3    4 
## -9.4 -9.8 -0.2 19.4

2. Residuals \(\mathbf{U}\) again

The remaining elements of effects equals to \(\mathbf{N}^\top\mathbf{Y}\) and can be used to calculate easily the residuals: \(\mathbf{U} = \mathbf{M}\mathbf{Y} = \mathbf{N}\mathbf{N}^\top\mathbf{Y}\).

N %*% m$effects[(m$rank + 1):(m$rank + m$df.residual)]

##      [,1]
## [1,]  0.4
## [2,] -1.2
## [3,]  1.2
## [4,] -0.4

m$residuals

##    1    2    3    4 
##  0.4 -1.2  1.2 -0.4

3. Solution \(\widehat{\beta}\) to normal equations again

Solution to normal equations is obtained as \[ \mathbf{X}^\top\mathbf{X}\mathbf{b} = \mathbf{X}^\top\mathbf{Y} \] \[ \mathbf{R}\mathbf{b} = \mathbf{Q}^\top\mathbf{Y} \] \[ \mathbf{R}\mathbf{b} = effects[1:r], \] which is solved by backward substitution:

(b_from_QR_again <- backsolve(R, m$effects[1:m$rank]))

## [1] -6.25  4.80  1.25

print(m$coefficients)                 ## previously calculated b

## (Intercept)           x          x2 
##       -6.25        4.80        1.25

4. Residual sum of squares

Residual sum of squares is \[ \mbox{SS}_e = \mathbf{U}^\top\mathbf{U} = \mathbf{Y}^\top\mathbf{M}^\top\mathbf{M}\mathbf{Y} = \mathbf{Y}^\top\mathbf{N}\mathbf{N}^\top\mathbf{N}\mathbf{N}^\top\mathbf{Y} = \mathbf{Y}^\top\mathbf{N}\mathbf{N}^\top\mathbf{Y} = \bigl\|\mathbf{N}^\top\mathbf{Y}\bigr\|^2 = \bigl\|effects[r+1, \ldots]\bigr\|^2. \]

(SSe_from_QR <- crossprod(m$effects[(m$rank + 1):(m$rank + m$df.residual)]))

##      [,1]
## [1,]  3.2

as.numeric(crossprod(m$residuals))               ## also sum of squares

## [1] 3.2

Methods for objects of class `lm`

Many methods have been implemented for objects of class lm which return the important quantities.

Available methods

methods(class = "lm")

##  [1] add1                alias               anova               Anova              
##  [5] avPlot              Boot                bootCase            boxCox             
##  [9] case.names          ceresPlot           coerce              confidenceEllipse  
## [13] confint             cooks.distance      crPlot              deltaMethod        
## [17] deviance            dfbeta              dfbetaPlots         dfbetas            
## [21] dfbetasPlots        drop1               dummy.coef          durbinWatsonTest   
## [25] effects             ellipse3d           extractAIC          family             
## [29] formula             hatvalues           hccm                infIndexPlot       
## [33] influence           influencePlot       initialize          inverseResponsePlot
## [37] kappa               labels              leveneTest          leveragePlot       
## [41] linearHypothesis    logLik              mcPlot              mmp                
## [45] model.frame         model.matrix        ncvTest             nextBoot           
## [49] nobs                outlierTest         plot                powerTransform     
## [53] predict             print               proj                qqnorm             
## [57] qqPlot              qr                  residualPlot        residualPlots      
## [61] residuals           rstandard           rstudent            show               
## [65] sigmaHat            simulate            slotsFromS3         spreadLevelPlot    
## [69] summary             variable.names      vcov               
## see '?methods' for accessing help and source code

Estimated regression coefficients \(\widehat{\beta}\) (solution to normal equations)

coefficients(m)

## (Intercept)           x          x2 
##       -6.25        4.80        1.25

coef(m)                  ## the same

## (Intercept)           x          x2 
##       -6.25        4.80        1.25

m$coefficients           ## the same

## (Intercept)           x          x2 
##       -6.25        4.80        1.25

Vector \(\mathbf{U}\) of residuals

residuals(m)

##    1    2    3    4 
##  0.4 -1.2  1.2 -0.4

m$residuals              ## the same

##    1    2    3    4 
##  0.4 -1.2  1.2 -0.4

Vector \(\widehat{\mathbf{Y}}\) of fitted values

fitted(m)

##    1    2    3    4 
## -9.4 -9.8 -0.2 19.4

m$fitted.values          ## the same

##    1    2    3    4 
## -9.4 -9.8 -0.2 19.4

Residual sum of squares \(\mbox{SS}_e\)

deviance(m)

## [1] 3.2

Covariance matrix of the estimated regression coefficients

In full rank models, \[ \mbox{var}\bigl(\widehat{\beta}\bigr) = \frac{\mbox{SS}_e}{n-r}\,\bigl(\mathbf{X}^\top\mathbf{X}\bigr)^{-1}: \]

vcov(m)

##             (Intercept)    x    x2
## (Intercept)        2.05 0.00 -0.25
## x                  0.00 0.16  0.00
## x2                -0.25 0.00  0.05

Confidence intervals for the regression coefficients based on the t-statistic

In full rank models, for given \(j = 0,\ldots, k-1\), a confidence interval for \(\beta_j\) with a coverage of \(1 - \alpha\) is given as \[ \widehat{\beta}_j \;\pm\; \mbox{S.E.}\bigl(\widehat{\beta}_j\bigr)\,\mbox{t}_{n-r}\bigl(1 - \alpha/2\bigr), \] where \(\mbox{S.E.}\bigl(\widehat{\beta}_j\bigr)\) is standard error of \(\widehat{\beta}_j\) (square root of appropriate diagonal element of matrix \(\mbox{var}\bigl(\widehat{\beta}\bigr)\)).

confint(m)

##                   2.5 %    97.5 %
## (Intercept) -24.4425166 11.942517
## x            -0.2824819  9.882482
## x2           -1.5911938  4.091194

confint(m, level = 0.99)

##                 0.5 %   99.5 %
## (Intercept) -97.39258 84.89258
## x           -20.66270 30.26270
## x2          -12.98408 15.48408

Summary of the most important results

summary(m)

## 
## Call:
## lm(formula = y ~ x + x2)
## 
## Residuals:
##    1    2    3    4 
##  0.4 -1.2  1.2 -0.4 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -6.2500     1.4318  -4.365   0.1434  
## x             4.8000     0.4000  12.000   0.0529 .
## x2            1.2500     0.2236   5.590   0.1127  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.789 on 1 degrees of freedom
## Multiple R-squared:  0.9943, Adjusted R-squared:  0.983 
## F-statistic: 87.62 on 2 and 1 DF,  p-value: 0.07532

Note that the result of summary is an object of its own class

sm <- summary(m)
class(sm)

## [1] "summary.lm"

names(sm)

##  [1] "call"          "terms"         "residuals"     "coefficients"  "aliased"       "sigma"        
##  [7] "df"            "r.squared"     "adj.r.squared" "fstatistic"    "cov.unscaled"

It is then possible to extract different important elements (model statistics).

1. Vector \(\mathbf{U}\) of residuals again

sm$residuals

##    1    2    3    4 
##  0.4 -1.2  1.2 -0.4

2. Table (matrix with four columns) with estimated regression coefficients including their standard errors, t-statistics and p-values

sm$coefficients

##             Estimate Std. Error   t value   Pr(>|t|)
## (Intercept)    -6.25  1.4317821 -4.365189 0.14336634
## x               4.80  0.4000000 12.000000 0.05292935
## x2              1.25  0.2236068  5.590170 0.11269007

3. Estimated residual standard deviation

\[ \widehat{\sigma} = \sqrt{\frac{1}{n-r}\mbox{SS}_e}. \]

sm$sigma

## [1] 1.788854

4. F-statistic to test the null hypothesis that all regression coefficients except the intercept are equal to zero

\[ F = \frac{\;\;\frac{\mbox{SS}_T}{n - 1}\;\;}{\;\;\frac{\mbox{SS}_e}{n - r}\;\;}. \]

sm$fstatistic

##  value  numdf  dendf 
## 87.625  2.000  1.000

5. Coefficient of determination \(R^2\) and adjusted coefficient of determination \(R^2_{adj}\)

\[ R^2 = 1 - \frac{\mbox{SS}_e}{\mbox{SS}_T}, \qquad R^2_{adj} = 1 - \frac{n-1}{n-r}\;\frac{\mbox{SS}_e}{\mbox{SS}_T}. \]

sm$r.squared

## [1] 0.9943262

sm$adj.r.squared

## [1] 0.9829787