Linear regression (NMSA407)

Arnošt Komárek

Subpages

Home (CZ) | Teaching (CZ) | BESEDA | NMST552 |

Teaching winter

NMSA407 | NMST431 |

Teaching summer

NMST432 | NMST440 |

Teaching, software

Rko (CZ) |

Theses

Diploma theses (CZ) | Bachelor theses (CZ) |

Linear regression (NMSA407)

Winter semester 2018–19

SIS pages of the course:    ENG    CZE

TIMETABLE

Lectures: Thursday 15:40 in K1   
Thursday 17:20 in K1   
Exercise class (MM1): Monday 9:00 in K11    (RNDr. Matúš Maciak, Ph.D.)
Exercise class (MM2): Tuesday 12:20 in K11    (RNDr. Matúš Maciak, Ph.D.)
Exercise class (SN): Thursday 10:40 in K4    (Mgr. Stanislav Nagy, Ph.D.)
  • Language of both lectures and all exercise classes is English if there is at least one officially subscribed student who is not enrolled in the Czech study programme.
  • Personal communication with the lecturer and the exercise class instructors can also be conducted in Czech or Slovak.

ANNOUNCEMENTS

27/09/2018:   Organization of the double lecture
Thursday lectures will run from 15:45 till 19:00 with a break of 15 minutes around the middle of this period.
 

EXAM

  • It is necessary to be in possession of a course credit (zápočet) to be able to take exam.
  • Exam grade will be based on two parts:
    1. Written part composed of theoretical and semi-practical assignments (no computer analysis).
    2. Oral part.
    For details, see this document (pdf).

  • Sample exam assignment is available here (pdf).
  • More details concerning the exam will be provided by the first week of January 2019.
The following exam dates have been open for enrollment in SIS:
  • Monday 14/01
  • Monday 21/01
  • Thursday 31/01
  • Monday 11/02
  • Wednesday 13/02
Capacity of each of the exam terms is 20. Oral part of the exam may take place the following day (this will always be specified at the end of the written part). The above set of the exam dates is final. No other exam dates will be available in the academic year 2018–19.

ENTRY REQUIREMENTS

This course closely follows the bachelor study branch General Mathematics and especially its subbranch Stochastics. The course hence builds upon decent knowledge of a classical mathematical thinking (theorem, proof, ...), knowledge acquired during very basic courses (mathematical analysis, linear algebra, ...) and also on intermediate knowledge of probability theory and mathematical statistics. The most important areas of general mathematics and mathematical statistics which are unavoidable to be able to follow this course include:

  • Vector spaces, matrix calculus;
  • Probability space, conditional probability, conditional distribution, conditional expectation;
  • Elementary asymptotic results (laws of large numbers, central limit theorem for i.i.d. random variables and vectors, Cramér-Wold theorem, Cramér-Slutsky theorem);
  • Foundations of statistical inference (statistical test, confidence interval, standard error, consistency);
  • Basic procedures of statistical inference (asymptotic tests on expected value, one- and two-sample t-test, one-way analysis of variance, chi-square test of independence);
  • Maximum-likelihood theory including asymptotic results and the delta method;
  • Working knowledge of R, a free software environment for statistical computing and graphics (R).

This course is not a cook-book course on linear regression and it does not make much sense to follow it without having a knowledge described above.

COURSE NOTES

PDF to download (last updated on 20/12/2018)

The course notes provide a record of the lecture including notes, comments etc. mentioned perhaps only orally during the lecture. Nevertheless, in some cases, the course notes do not include proofs or derivations which are fully shown on the blackboard during the lecture.

The lecture will follow the notes quite closely and more or less in a linear way. Students are advised to bring printed course notes to the lecture and supplement them by their own hand-written notes. Not everything that will be said will be written on the blackboard (especially various remarks etc.). Also statements of definitions and theorems will not be fully written on the blackboard. At the same time, it is also true that not everything that will be said during the lecture is mentioned in the notes...

The course notes include landmarks showing approximately content of each lecture (based on past experience). It is adviced to print the course notes per partes as the notes may undertake some modifications (most likely not huge) during the semester.

ADVICE: Past experience suggests that individual reading of the notes only is in most cases insufficient to be prepared for exam. The course notes are intended as a supplement of the lecture, not its replacement.

COURSE SLIDES

PDF to download (last updated on 27/09/2018)

Course slides will be projected during the lecture. They mainly contain

  • the structure of the lecture;
  • statements of definitions and theorems;
  • some illustrative plots/computer output.

Course slides alone are rather incomplete as a study material. In principle, it is not necessary to print the slides. Information they contain is just a subset of information included in the notes, only in a different format (suitable for projection). Also the course slides may undertake some modifications during the semester.

EXERCISE CLASSES

All information related to the exercise classes is (will be) available at the central exercise classes webpage.

Detailed description of requirements to get the course credit (zápočet) is described in this document (published on 27/09/2018).

Exercise classes are synchronized. Content of the classes held in the same week is approximately the same.

SUPPLEMENTARY R PACKAGE

The course is supplemented by the R package mffSM which contains example datasets used throughout the course and few additional small functions related to processing of the linear model fit. Upon download (from the link below, not from CRAN), the package can be installed in R in a standard way (``from a local repository''). Windows binary file is intended for the MS Windows users (as the title suggests), the source code is intended for users of other (mostly more reliable) operating systems where it is a standard to compile the package from its source (Linux, Mac etc.). The mffSM package depends on packages colorspace, lattice, car, which are available in a standard way from CRAN. All those dependency packages should normally be automatically installed if the installation of the mffSM package is performed directly from the R console on an Internet-connected computer using the command (its appropriately modified analogy):

install.packages("PATH_WHERE_DOWNLOADED/mffSM_1.1.[tar.gz,zip]", repos = NULL)

Source code:   mffSM_1.1.tar.gz
Windows binary:   mffSM_1.1.zip
 

R TUTORIALS

R tutorials show the R analyses that are based on theory given during the lectures. They also provide the code used to prepare majority of the output/plots that is used during the lectures as illustrations. The R tutorials may serve as a reference for the assignments performed during the exercise classes or required in homeworks.

The R scripts provided below assume that the content of the .Rprofile is sourced at start.

1. Linear Model
  1. Simple illustration of a linear model (data Hosi0)    html    R code
 
2. Least Squares Estimation
  1. Matrix algebra background of linear regression    html    R code
  2. R function lm    html    R code
 
3. Normal Linear Model
  1. Inference in a model with the regression line (data Cars2004nh)    html    R code
  2. Joint inference on a vector of estimable parameters (data Cars2004nh)    html    R code
  3. Confidence interval for the model based mean, prediction interval (data Hosi0)    html    R code
  4. Confidence interval for the model based mean, prediction interval (data Kojeni)    html    R code
 
4. Basic Regression Diagnostics
  1. Basic Regression Diagnostics (data Cars2004nh)    html    R code
 
7. General Linear Model
  1. Weighted least squares (data Kojeni and wKojeni)    html    R code
 
8. Parameterizations of Covariates
  1. Numeric covariate: simple transformation, polynomial regression, regression splines (data Houses1987)    html    R code
  2. Numeric covariate: regression splines (data Motorcycle)    html    R code
  3. Categorical nominal covariate (data Cars2004nh)    html    R code
  4. Categorical ordinal covariate (data Cars2004nh)    html    R code
 
9. Additivity and Interactions
  1. Two numeric covariates (data Cars2004nh)    html    R code
  2. Numeric and categorical covariate (data Cars2004nh)    html    R code
  3. ANOVA tables of type I, II and III (data Cars2004nh)    html    R code
 
10. Analysis of Variance
  1. Two-way Analysis of Variance (data Howells)    html    R code
 
11. Simultaneous Inference in a Linear Model
  1. Multiple comparison procedures (Tukey, Hothorn–Bretz–Westfall) (data Howells)    html    R code
  2. Multiple comparison procedures (Hothorn–Bretz–Westfall) (data Cars2004nh)    html    R code
  3. Confidence band around and for the regression function (data Kojeni)    html    R code
 
12. Checking Model Assumptions
  1. Partial residuals, Simpson's paradox (data Policie)    html    R code
  2. Partial residuals (data Cars2004nh)    html    R code
  3. Residual plots and tests on assumptions (data Cars2004nh)    html    R code
  4. Checking homoscedasticity (data Draha)    html    R code
  5. Checking uncorrelated errors (data Olympic)    html    R code
  6. Transformation of response: ANOVA with log-transformed response    html    R code
      to get normality and homoscedasticity (data Houses1987)
  7. Transformation of response: Regression with log-transformed response    html    R code
      to stabilize the variance, Box–Cox transformation (data Cars2004nh)
 
13. Problematic Regression Space
  1. Multicollinearity (data IQ)    html    R code
  2. Multicollinearity (data Cars2004nh)    html    R code
 
15. Unusual Observations
  1. Unusual observations (data Cars2004)    html    R code
 

 

View My Stats