Machine Learning 1: classification methods
General data
Course ID: | 2400-DS1ML1 |
Erasmus code / ISCED: |
14.3
|
Course title: | Machine Learning 1: classification methods |
Name in Polish: | Machine Learning 1: classification methods |
Organizational unit: | Faculty of Economic Sciences |
Course groups: |
(in Polish) Przedmioty 4EU+ (z oferty jednostek dydaktycznych) (in Polish) Przedmioty kierunkowe do wyboru - studia II stopnia IE - grupa 2 (2*30h) English-language course offering of the Faculty of Economics Mandatory courses for 1st year students of Data Science and Business Analytics |
ECTS credit allocation (and other scores): |
4.00
|
Language: | English |
Type of course: | obligatory courses |
Short description: |
This course provides a broad perspective on application of Machine Learning methods in supervised learning for regression and classification problems. It includes both the description of theoretical background and practical examples and illustrations. The course covers the basis of machine learning including measuring performance, model testing, details of validation methods, feature engineering and selection, simple linear and logistic regression, discriminant analysis as well as K-nearest neighbors, Support Vector Machines, ridge and Lasso regression modelling methods. |
Full description: |
1. Introduction to Machine Learning a. What is and what is not machine learning b. Differences between classification, regression and clustering c. Introducing a cost function d. Sample parametric methods - linear regression and logistic regression 2. Measuring performance, machine learning diagnostics a. Performance measures of supervised learning algorithms (model performance, error, confusion matrix and ratios, ROC curve, AUC, RMSE) b. Learning curves c. Training set and test set 3. Testing the model a. Extending model complexity to increase fit b. The concept of bias and variance and their trade-off c. Cross-validation, selection of number of folds 4. Feature engineering a. Feature transformation b. Discretization of continuous features c. Feature standardization/normalization 5. k-NN a. Classification with k-nearest neighbours b. Regression with k-nearest neighbours 6. Support Vector Machines a. Optimization objective b. Separating the data with a maximum margin c. Kernel selection for more complex data d. Modification of SVM algorithm for regression problems 7. Feature selection methods a. Wrapper methods including automated selection (forward, backward and stepwise) b. Filter methods – applying scoring to features (e.g. Chi squared test, information gain and correlation coefficient scores) 8. Regularization methods a. introducing penalty for complexity b. L1 regularization for additional sparsity in coefficients c. L2 regularization for penalization of large coefficients d. Regularized linear regression e. Regularized logistic regression 9. Lasso regression 10. Workshops on real data 11. Project presentations |
Bibliography: |
Harrington, Peter. Machine learning in action. Vol. 5. Greenwich, CT: Manning, 2012. Zumel, Nina, John Mount, and Jim Porzak. Practical data science with R. Manning, 2014. Lantz, Brett. Machine learning with R. Packt Publishing Ltd, 2013. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. "The Elements of Statistical Learning: Data Mining, Inference, and Prediction." Springer Series in Statistics ( (2009). |
Learning outcomes: |
After completing the course, the average student will have reliable, structured knowledge on a wide range of unsupervised learning algorithms for regression and classification problems, such as linear and logistic regression, linear discriminant analysis, kNN, ridge regression, LASSO, Support Vector Machine. They will know the theoretical foundations of these algorithms, as well as have programming skills allowing their application in practice. They will be able to select predictive modeling algorithms that are best suited to the specific research problem, perform reliable validation of models, select and transform variables, and perform an independent research project using the methods learned. K_U02, K_U05 |
Assessment methods and assessment criteria: |
Harrington, Peter. Machine learning in action. Vol. 5. Greenwich, CT: Manning, 2012. Zumel, Nina, John Mount, and Jim Porzak. Practical data science with R. Manning, 2014. Lantz, Brett. Machine learning with R. Packt Publishing Ltd, 2013. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. "The Elements of Statistical Learning: Data Mining, Inference, and Prediction." Springer Series in Statistics ( (2009). |
Classes in period "Summer semester 2023/24" (in progress)
Time span: | 2024-02-19 - 2024-06-16 |
Navigate to timetable
MO KON
KON
TU KON
KON
W TH FR |
Type of class: |
Seminar, 30 hours
|
|
Coordinators: | Piotr Wójcik | |
Group instructors: | Szymon Lis, Michał Woźniak, Piotr Wójcik | |
Students list: | (inaccessible to you) | |
Examination: |
Course -
Grading
Seminar - Grading |
Copyright by University of Warsaw.