Large-scale machine learning
General data
Course ID: | 1000-319bBML |
Erasmus code / ISCED: |
11.3
|
Course title: | Large-scale machine learning |
Name in Polish: | Uczenie maszynowe w dużej skali |
Organizational unit: | Faculty of Mathematics, Informatics, and Mechanics |
Course groups: |
Elective courses for Computer Science Obligatory courses for 2nd year Machine Learning |
ECTS credit allocation (and other scores): |
6.00
|
Language: | English |
Type of course: | elective monographs |
Requirements: | Deep neural networks 1000-317bDNN |
Prerequisites (description): | object oriented programming, computer networks, algorithms and data structures |
Short description: |
During this class we will present techniques and tools for processing Big data. We will focus on the ones useful for machine learning practitioners. We will show the most important models and basic algorithmic techniques. We will cover how to analyze algorithms that process large data on clusters. Finally, we will introduce typical optimizations that can be useful in machine learning applications like linear regression, clustering, decision trees or neural networks. |
Full description: |
-Distributing computation to clusters of commodity machines and distributed file system. -MapReduce model and basic algorithmic techniques for this model. Comparing of MapReduce algorithms and typical algorithms for typical problems (matrix multiplication, multi-way join, counting triangles in large graphs). -Total vs elapsed communication cost. Skew and methods to deal with it. -Spark and Resilient Distributed Dataset model. -Spark SQL and its optimizations. -Serialization of Big data and columnar formats. -Managed cloud data warehouse. -Algorithms for stream pressing. -Distributing typical machine learning algorithms, e.g., linear regression, clustering, decision trees or neural networks. -Neural networks in large scale (data parallelism, model paralelizm). -Learned index structores. |
Bibliography: |
-Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press -Guglielmo Iozzia, Hands-On Deep Learning with Apache Spark, Packt Publishing -Butch Quinto, Next-Generation Machine Learning with Spark: Covers XGBoost, -LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More, Apress |
Learning outcomes: |
(in Polish) Wiedza: student zna i rozumie techniki wielkoskalowego przetwarzania danych używane w kontekście uczenia maszynowego [K_W04] metody rozpraszania i zrównoleglania obliczeń [K_W06] Umiejętności: student potrafi stosować współczesne systemy rozpraszania i zrównoleglania obliczeń [K_U20] przetwarzać duże zbiory danych [K_U21] Kompetencje społeczne: student jest gotów do krytycznej oceny posiadanej wiedzy i odbieranych treści [K_K01] uznawania znaczenia wiedzy w rozwiązywaniu problemów poznawczych i praktycznych oraz zasięgania opinii ekspertów w przypadku trudności z samodzielnym rozwiązaniem problemu [K_K02] |
Assessment methods and assessment criteria: |
Final mark based big programming assignments, points for participation in laboratories and written exam. |
Classes in period "Winter semester 2023/24" (past)
Time span: | 2023-10-01 - 2024-01-28 |
Navigate to timetable
MO TU W WYK
LAB
LAB
TH FR LAB
|
Type of class: |
Lab, 30 hours
Lecture, 30 hours
|
|
Coordinators: | Krzysztof Rządca, Jacek Sroka | |
Group instructors: | Tomasz Kanas, Krzysztof Rządca, Jacek Sroka | |
Students list: | (inaccessible to you) | |
Examination: | Examination |
Classes in period "Winter semester 2024/25" (future)
Time span: | 2024-10-01 - 2025-01-26 |
Navigate to timetable
MO TU W TH FR |
Type of class: |
Lab, 30 hours
Lecture, 30 hours
|
|
Coordinators: | Marek Cygan, Krzysztof Rządca | |
Group instructors: | Marek Cygan, Tomasz Kanas, Jakub Krajewski, Michał Krutul, Adrian Naruszko, Krzysztof Rządca | |
Students list: | (inaccessible to you) | |
Examination: | Examination |
Copyright by University of Warsaw.