University of Warsaw - Central Authentication System
Strona główna

Large-scale machine learning

General data

Course ID: 1000-319bBML
Erasmus code / ISCED: 11.3 Kod klasyfikacyjny przedmiotu składa się z trzech do pięciu cyfr, przy czym trzy pierwsze oznaczają klasyfikację dziedziny wg. Listy kodów dziedzin obowiązującej w programie Socrates/Erasmus, czwarta (dotąd na ogół 0) – ewentualne uszczegółowienie informacji o dyscyplinie, piąta – stopień zaawansowania przedmiotu ustalony na podstawie roku studiów, dla którego przedmiot jest przeznaczony. / (0612) Database and network design and administration The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Large-scale machine learning
Name in Polish: Uczenie maszynowe w dużej skali
Organizational unit: Faculty of Mathematics, Informatics, and Mechanics
Course groups: Elective courses for Computer Science
Obligatory courses for 2nd year Machine Learning
ECTS credit allocation (and other scores): 6.00 Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: English
Type of course:

elective monographs

Requirements:

Deep neural networks 1000-317bDNN
Natural language processing 1000-318bNLP
Statistical machine learning 1000-317bSML

Prerequisites (description):

object oriented programming, computer networks, algorithms and data structures

Short description:

During this class we will present techniques and tools for processing Big data. We will focus on the ones useful for machine learning practitioners. We will show the most important models and basic algorithmic techniques. We will cover how to analyze algorithms that process large data on clusters. Finally, we will introduce typical optimizations that can be useful in machine learning applications like linear regression, clustering, decision trees or neural networks.

Full description:

-Distributing computation to clusters of commodity machines and distributed file system.

-MapReduce model and basic algorithmic techniques for this model. Comparing of MapReduce algorithms and typical algorithms for typical problems (matrix multiplication, multi-way join, counting triangles in large graphs).

-Total vs elapsed communication cost. Skew and methods to deal with it.

-Spark and Resilient Distributed Dataset model.

-Spark SQL and its optimizations.

-Serialization of Big data and columnar formats.

-Managed cloud data warehouse.

-Algorithms for stream pressing.

-Distributing typical machine learning algorithms, e.g., linear regression, clustering, decision trees or neural networks.

-Neural networks in large scale (data parallelism, model paralelizm).

-Learned index structores.

Bibliography:

-Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press

-Guglielmo Iozzia, Hands-On Deep Learning with Apache Spark, Packt Publishing

-Butch Quinto, Next-Generation Machine Learning with Spark: Covers XGBoost, -LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More, Apress

Learning outcomes: (in Polish)

Wiedza: student zna i rozumie

techniki wielkoskalowego przetwarzania danych używane w kontekście uczenia maszynowego [K_W04]

metody rozpraszania i zrównoleglania obliczeń [K_W06]

Umiejętności: student potrafi

stosować współczesne systemy rozpraszania i zrównoleglania obliczeń [K_U20]

przetwarzać duże zbiory danych [K_U21]

Kompetencje społeczne: student jest gotów do

krytycznej oceny posiadanej wiedzy i odbieranych treści [K_K01]

uznawania znaczenia wiedzy w rozwiązywaniu problemów poznawczych i praktycznych oraz zasięgania opinii ekspertów w przypadku trudności z samodzielnym rozwiązaniem problemu [K_K02]

Assessment methods and assessment criteria:

Final mark based big programming assignments, points for participation in laboratories and written exam.

Classes in period "Winter semester 2023/24" (past)

Time span: 2023-10-01 - 2024-01-28
Selected timetable range:
Navigate to timetable
Type of class:
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Krzysztof Rządca, Jacek Sroka
Group instructors: Tomasz Kanas, Krzysztof Rządca, Jacek Sroka
Students list: (inaccessible to you)
Examination: Examination

Classes in period "Winter semester 2024/25" (future)

Time span: 2024-10-01 - 2025-01-26
Selected timetable range:
Navigate to timetable
Type of class:
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Marek Cygan, Krzysztof Rządca
Group instructors: Marek Cygan, Tomasz Kanas, Jakub Krajewski, Michał Krutul, Adrian Naruszko, Krzysztof Rządca
Students list: (inaccessible to you)
Examination: Examination
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement USOSweb 7.0.3.0 (2024-03-22)