University of Warsaw - Central Authentication System
Strona główna

Statistical data analysis

General data

Course ID: 1000-714SAD
Erasmus code / ISCED: 11.303 Kod klasyfikacyjny przedmiotu składa się z trzech do pięciu cyfr, przy czym trzy pierwsze oznaczają klasyfikację dziedziny wg. Listy kodów dziedzin obowiązującej w programie Socrates/Erasmus, czwarta (dotąd na ogół 0) – ewentualne uszczegółowienie informacji o dyscyplinie, piąta – stopień zaawansowania przedmiotu ustalony na podstawie roku studiów, dla którego przedmiot jest przeznaczony. / (0612) Database and network design and administration The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Statistical data analysis
Name in Polish: Statystyczna analiza danych
Organizational unit: Faculty of Mathematics, Informatics, and Mechanics
Course groups: Obligatory courses for 2nd year Bioinformatics
Obligatory courses for 3rd grade Mathematics
ECTS credit allocation (and other scores): 6.00 Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: Polish
Type of course:

obligatory courses

Prerequisites (description):

(in Polish) Oczekuje się dobrej znajomości zagadnień ujętych w sylabusach przedmiotów Analiza matematyczna II.1 oraz Rachunek prawdopodobieństwa I.

Short description:

Introduction to basic statistical notions and tools, such as parameter estimation and hypothesis testing. Introduction to data science, covering classification and clustering methods.

The Mathematics students can alternatively take the course

1000-116bST,

which has a different character.

Full description:

1. Basic notions of probability calculus and statistics: random variables, their distributions, expected value and variance, probability space.

2. Basic notions of statistics: statistical space, random experiment, statistic, statistical model, model evaluation methods.

3. Parameter estimation. Bias and efficiency, maximum likelihood estimatoes, confidence intervals.

4. Summary and visualisation of data. Quantile-quantile plots. Histograms, kernel density estimation, boxplot.

5. Hypothesis testing. The notion of a statistical hypothesis, the procedure of hypothesis testing, type I and type II errors, power of a test, Neyman-Pearson lemma, parametric statistical significance tests, significance tests for a mean, significance tests for a variance.

6. The notion of p-value and potential misunderstandings and misusage, effect size, multiple hypothesis testing.

7. Useful statistical tests. Statistical significance test for two means, non-parametric tests for two medians, Pearson's chi-squared test, analysis of variance.

8. Linear regression, simple, multiple, with extensions: assumptions, parameter estimation, evaluation of goodness of fit.

9. Classification. Logistic regression, LDA, QDA, KNN.

10. Resampling methods. Cross-validation, bootstrap.

11. Model selection and regularisation. Feature selection, usage of a validation set, usage of cross-validation, analysis of high-dimensional data, lasso and ridge regression, partial least squares.

12. Tree-based models: decision trees, bagging, random forests, boosting

13. Support vector machines. Separating hyperplanes, maximum margin classifyier, support vector machines.

14. Dimensionality reduction. PCA.

15. Unsupervised learning. The notion of clustering, methods of hierarchical clustering and k-means.

16. Nonlinear models. Polynomial regression, splines, generalized additive models.

Bibliography:

Lesław Gajek, Marek Kałuszka, Wnioskowanie statystyczne, modele i metody.

John A. Rice, Mathematical Statistics and Data Analysis.

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. Introduction to Statistical Learning in R.

Learning outcomes:

Knowledge:

1. general knowledge of the problems of statistical data analysis.

2. basic knowledge of the statistical tools used in the modeling and analysis of data.

3. basic notions and methods of probability calculus and statistics, including parameter estimation and hypothesis testing methods.

Skills:

1. performing simple statistical analysis and statistical testing.

2. using modern statistical analysis tools.

Social skills:

1. Ability to explain statistical inference in plain words.

Assessment methods and assessment criteria:

Impact on the final grade: the exam grade 40%, mid-term test 20%, assignment 10%, in class activity 10%, in lab activity 10%.

Classes in period "Summer semester 2023/24" (in progress)

Time span: 2024-02-19 - 2024-06-16
Selected timetable range:
Navigate to timetable
Type of class:
Classes, 15 hours more information
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Błażej Miasojedow
Group instructors: Barbara Domżał, Błażej Miasojedow, Szymon Nowakowski, Piotr Pokarowski, Łukasz Rajkowski
Students list: (inaccessible to you)
Examination: Examination

Classes in period "Summer semester 2024/25" (future)

Time span: 2025-02-17 - 2025-06-08
Selected timetable range:
Navigate to timetable
Type of class:
Classes, 15 hours more information
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Błażej Miasojedow
Group instructors: Błażej Miasojedow, Szymon Nowakowski, Piotr Pokarowski, Łukasz Rajkowski
Students list: (inaccessible to you)
Examination: Examination
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement USOSweb 7.0.3.0 (2024-03-22)