University of Warsaw - Central Authentication System
Strona główna

Unsupervised Learning

General data

Course ID: 2400-DS1UL
Erasmus code / ISCED: 14.3 The subject classification code consists of three to five digits, where the first three represent the classification of the discipline according to the Discipline code list applicable to the Socrates/Erasmus program, the fourth (usually 0) - possible further specification of discipline information, the fifth - the degree of subject determined based on the year of study for which the subject is intended. / (0311) Economics The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Unsupervised Learning
Name in Polish: Unsupervised Learning
Organizational unit: Faculty of Economic Sciences
Course groups: (in Polish) Przedmioty kierunkowe do wyboru - studia II stopnia IE - grupa 1 (6*30h)
English-language course offering of the Faculty of Economics
Mandatory courses for 1st year students of Data Science and Business Analytics
ECTS credit allocation (and other scores): 3.00 Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.
Language: English
Type of course:

obligatory courses

Short description:

Unsupervised learning course is focused on exploring data structure and extracting crucial information from unlabeled data without any particular variable of interest. The course is based on three subfields of unsupervised learning: clustering, dimension reduction and association rule learning. Both theoretical and practical aspects are going to be discussed. The course is realized in a computer lab. Passing requirements: group projects. The course is dedicated for graduate students (Econometrics and Informatics, Data Science).

Full description:

The main aim of the course is to make students familiar with research opportunities associated with data mining algorithms (Knowledge Discovery in Databases, KDD) and their usage in business applications. Three blocks of subjects are going to be fulfilled: clustering, dimension reduction and association rule learning.

Each of the blocks will be divided into four stages: i) introduction and construction of basic algorithms, ii) familiarization with accessible commands in R, their comparison and evaluation, iii) work with newest literature sources, iv) group project.

BLOCK 1: Clustering

Data research will be done by clustering. Several methods are going to be introduced: distance-based, k-means, Partitioning Around Medoids (PAM), Clustering Large Applications (CLARA), Clustering Large Applications based on RANdomized Search (CLARANS) or nonparametric clustering, hierarchical clustering, dictionary learning, linkage methods and probabilistic ones. Different ways of identifying the optimal number of clusters (CH index, Silhouette index) and agreement indices will be presented.

BLOCK 2: Dimension reduction

Analysis of main components of the principal component analysis (PCA), multidimensional scaling (classic and metric), as well as actual non-linear dimension reduction methods.

BLOCK 3: Association rule learning

Main algorithms of association rules are going to be introduced (Apriori, Eclat, FP-growth, OPUS). They are applied i.a. in market based analysis and common patterns between purchased goods. Crucial measures of rules and transactions (support, confidence, lift, difference of consifence) will be described.

Models based on real data will be prepared (with cleaning and transforming the input data). Visualization methods of data regarding transactions, rules and clusters (also interactive), as well as simplifying tools for big data by sampling will be done. Examples of used packages: arules, arulesViz, stats, cluster, pdfCluster, clues i inne (see R TaskViews „Cluster” - Cluster Analysis & Finite Mixture Models).

Bibliography:

Papers provided by lecturers as well as:

Bousquet, O.; von Luxburg, U.; Raetsch, G., eds. (2004). Advanced Lectures on Machine Learning. Springer-Verlag.

Duda, Richard O.; Hart, Peter E.; Stork, David G. (2001). "Unsupervised Learning and Clustering". Pattern classification (2nd ed.). Wiley.

Hastie, Trevor; Tibshirani, Robert (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer.

Learning outcomes:

- Student has knowledge about unsupervised learning principles

- Student is familiar with unsupervised learning methodology

- Student is able to analyze data by using unsupervised learning approach

- Student is able to use knowledge about unsupervised learning to conduct his/her own research

- Student gains, processes and analyzes data independently

- Student is capable of working in groups and co-operating with others

- Student is able to formulate his/her point of view and express it in the discussion

- Student expresses his/her research curiosity and openness towards economic phenomenon

K_W01, K_U01, K_U02, K_U03, K_U04, K_U05, KS_01,

Assessment methods and assessment criteria:

Evaluation of group projects

Classes in period "Winter semester 2023/24" (past)

Time span: 2023-10-01 - 2024-01-28
Selected timetable range:
Go to timetable
Type of class:
Lab, 30 hours more information
Coordinators: Katarzyna Kopczewska, Jacek Lewkowicz
Group instructors: Katarzyna Kopczewska, Jacek Lewkowicz
Students list: (inaccessible to you)
Credit: Course - Grading
Lab - Grading

Classes in period "Winter semester 2024/25" (past)

Time span: 2024-10-01 - 2025-01-26
Selected timetable range:
Go to timetable
Type of class:
Seminar, 30 hours more information
Coordinators: Katarzyna Kopczewska, Jacek Lewkowicz
Group instructors: Katarzyna Kopczewska, Jacek Lewkowicz
Students list: (inaccessible to you)
Credit: Course - Grading
Seminar - Grading
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement site map USOSweb 7.1.1.0-3 (2024-12-18)