851401 Introduction to statistical learning with R


Type
Lecture and exercise
Semester hours
2
Lecturer (assistant)
Melcher, Michael
Organisation
Institute of Applied Statistics and Computing (IASC)
Offered in
Wintersemester 2017/18
Languages of instruction
Englisch

Content

We are currently living in an age of (big) data (e.g. 300 hours of video are uploaded on youtube, 350000 tweets are tweeted on Twitter and about 200000 photos are uploaded on facebook - every minute; sensor systems implemented in CERN may producte 1 Petabyte of data per second). It is obvious that data without evaluation/interpretation is senseless.
This lecture deals with two important fields in the study of data - supervised and unsupervised (statistical) learning problems. In the former case one wishes to predict a certain quantity (response) on the basis of measurements of other variables (predictors); in the latter (unsupervised) case, the aim is simply to find structure (e.g. groups) in the data.

The emphysis of the lecture will be on practical applications with a minimum of (mathematical/statistical) theory. It will be organized as follows:

1. Introduction to statistical learning: classification versus regression, responses/predictors, supervised and unsupervised learning; literature on statistical learning and the software environment R
2. (A very short) introduction to R - installation, usage with the IDE RStudio, data types, functions, graphics, statistical distributions, data import and export
3. Linear Models: simple and multiple linear regression, regression diagnostics, methods of variable selection; factors or polynomials as regressors, robust regression.
4. Shrinkage Methods: Ridge regression, least shrinkage and selection operator (LASSO), least angle regression (LARS), elastic net
5. Classification: Logistic regression, linear and quadratic discriminant analysis (LDA/QDA).
6. Regression and Classification Trees
7. Unsupervised Learning: Principal Component Analysis (PCA), Hierarchical and K-Means Clustering
8. Validation Methods & Model Selection

Previous knowledge expected

Basic knowledge in statistics and - in an ideal case - of the statistical programming language/environment R is expected. It is recommended (but not necessary) to visit one of the introductory lectures 851309 ("Statistics with R") or 851402 ("Statistik mit R") and the advanced course 851321 ("Programmieren mit R").

Objective (expected results of study and acquired competences)

Students shall be aware of the most important methods in the supervised and unsupervised learning context, their advantages and disadvantages/limitations and be able to apply these methods to real-world problems/data using the statistical programming language R.
You can find more details like the schedule or information about exams on the course-page in BOKUonline.