University of Warsaw - Central Authentication System
Strona główna

Data mining

General data

Course ID: 1000-2M03DM
Erasmus code / ISCED: 11.303 Kod klasyfikacyjny przedmiotu składa się z trzech do pięciu cyfr, przy czym trzy pierwsze oznaczają klasyfikację dziedziny wg. Listy kodów dziedzin obowiązującej w programie Socrates/Erasmus, czwarta (dotąd na ogół 0) – ewentualne uszczegółowienie informacji o dyscyplinie, piąta – stopień zaawansowania przedmiotu ustalony na podstawie roku studiów, dla którego przedmiot jest przeznaczony. / (0612) Database and network design and administration The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Data mining
Name in Polish: Data mining
Organizational unit: Faculty of Mathematics, Informatics, and Mechanics
Course groups: (in Polish) Przedmioty obieralne na studiach drugiego stopnia na kierunku bioinformatyka
Elective courses for Computer Science
Elective courses for Machine Learning
ECTS credit allocation (and other scores): 6.00 Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: English
Main fields of studies for MISMaP:

computer science
mathematics

Type of course:

elective monographs

Prerequisites:

Machine learning 1000-2N09SUS

Prerequisites (description):

It is recommended that a person registering for the course should have basic knowledge of machine learning methods and data processing.

Mode:

Classroom

Short description:

Presentation of the main issues in the field of data mining and the methods to resolve them. Discussion about the efficient implementation on large collections of data for basic problems, such as associative rules, data preparation, discretization of real value attributes, decision tree. Presentation of modern computation techniques such as parallel processing, evolutionary computation, using standard heuristic databases or specially constructed data structures.

Full description:

1. Introduction to KDD and data mining; templates and patterns

2. Transaction data analysis and association rules; main algorithms for association rule generation: Apriori, AprioriTid, FP-tree.

3. Classification problem and classifier evaluation methods; case based methods, naive Bayes classifiers, Bayesian networks. Improving nearest neighbors classifiers 4. Entropy measure and decision tree methods.

5. Clustering problem and clustering algorithms

6. Computational learning theorem

7. Rule-based classifiers;

8. Data cleaning and data preprocessing techniques;

9. Hidden Markov Model and its application

10. Searching for sequence patterns from time series data

11. OLAP and data mining

12. Web mining and text mining;

Bibliography:

1. "Data Mining: Concepts and Techniques". J. Han and M. Kamber. Morgan Kaufmann Publishers. 2001

2. "Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations". I. Witten and E. Frank. Morgan Kaufmann Publishers. 2000.

3. "Advances in Knowledge Discovery and Data Mining". Eds.: Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy. The MIT Press, 1995.

4. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets (2nd. ed.). Cambridge University Press, USA.

Learning outcomes:

Knowledge and skills:

1. Knows the basic classes of problems related to data mining and knowledge discovery.

2. Knows and is able to use in practice the methods of market basket analysis, understands and is able to apply in practice the algorithms for searching for frequent itemsets.

3. Knows and is able to apply basic ML algorithms.

4. Can evaluate the effectiveness of ML models in classification, regression, and clustering problems.

5. Knows the basic techniques of text processing for the construction of ML models and is able to apply them in practice.

6. Can construct simple recommendation systems and understand their operation.

7. Knows the basic methods of constructing predictive models for time series. Can apply them to real-world data sets and assess their actual effectiveness.

8. Knows current major trends in fields of science related to machine learning and knowledge discovery from databases.

Social competence:

1. Is able to prepare a report on exploratory data analysis presenting the most important information using data visualization techniques.

2. Can present the results of the conducted analyzes.

Assessment methods and assessment criteria:

The final grades are based on the sum of points from the laboratory and the exam.

Additionally, doctoral students may pass this course through the preparation of a special project involving participation in an international data mining competition.

Classes in period "Summer semester 2023/24" (in progress)

Time span: 2024-02-19 - 2024-06-16
Selected timetable range:
Navigate to timetable
Type of class:
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Hung Son Nguyen
Group instructors: Hung Son Nguyen
Students list: (inaccessible to you)
Examination: Examination

Classes in period "Summer semester 2024/25" (future)

Time span: 2025-02-17 - 2025-06-08
Selected timetable range:
Navigate to timetable
Type of class:
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Hung Son Nguyen
Group instructors: Hung Son Nguyen, Marcin Szczuka
Students list: (inaccessible to you)
Examination: Examination
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement USOSweb 7.0.3.0 (2024-03-22)