University of Warsaw - Central Authentication System
Strona główna

Data analysis and visualization

General data

Course ID: 1000-719DAV
Erasmus code / ISCED: (unknown) / (unknown)
Course title: Data analysis and visualization
Name in Polish: Analiza i wizualizacja danych
Organizational unit: Faculty of Mathematics, Informatics, and Mechanics
Course groups: Elective courses for 2nd stage studies in Mathematics
Specific programme courses of 2nd stage Bioinformatics
ECTS credit allocation (and other scores): 6.00 Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: English
Type of course:

obligatory courses

Short description:

The aim of the course is to introduce the techniques of data analysis and visualization to the students.

Full description:

The students will learn how to process and visualize the data (in most common formats e.g., csv, json, xml) using scripting language (Python). This include using build-in libraries and writing custom parsers.

The course will have two parts:

Part 1 – Introduction to Python programming (jupyter)

Part 2 – Data analysis and visualization (numpy, pandas, scip, matplotlib, seaborn, plotly, ImageMagick)

• static plots

• interactive and animated plots

The students will be able to get hands-on the most popular methods of data analysis and visualization (including working with multivariable data).

The general knowledge presented during lectures will be used during the exercises in front of the computer. All exercises and projects will be done using Python programming language.

The lectures:

1) Introduction to the Python.

2) Jupyter.

3) Data sets. The most common data sets (e.g., Anscombe's quartet, Iris, MNIST) and formats (csv, json, xml, fastaq).

4) Data sets. Pre-processing using build-in libraries and writing custom parsers (numpy, pandas).

5) Statistic analysis. Mean average, variance, correlation, linear regression (scipy).

6) Statistical classification. Decision trees. Random forests. Support vector machines. (Deep) neural networks.

7) Data visualization. Using Python ploting libraries (matplotlib, seaborn, plotly, ImageMagick).

8) Data visualization. Graphics (colors, lines, etc.) and their use in data presentation. Transformation of variables for better visibility. Time scales. Different types of plots (scatter, pie, bar, histogram, heatmap, boxplot).

9) Data visualization. The most common errors during plotting. The importance of colors on the plot. The perception of the data depending on the complexity and the type of the plot.

10) Plot customization. Legend. Colors. Axes (scale of measure: nominal, ordinal, interval, logarithmic and ratio).

11) Static vs. interactive and animated ploting.

Bibliography:

1. Dive Into Python 3 (http://histo.ucsf.edu/BMS270/diveintopython3-r802.pdf)

2. Python Data Analysis, Ivan Idris, 2014

3. Python for Data Analysis, Wes MacKinney, 2013

4. [In Polish] Zbiór esejów o sztuce pokazywania danych, P. Biecek, 2014 (http://www.biecek.pl/Eseje/).

homepage:https://www.mimuw.edu.pl/~lukaskoz/teaching/dav/

Learning outcomes:

Knowledge

1. Has general knowledge of programming.

2. Has knowledge on programming constructs and syntax of the Python programming language (assignment, control instructions, subroutine call and parameter passing).

3. Has knowledge on data structures and operations on them.

4. Has knowledge on information management, in particular in database systems, data modelling, data storage and information retrieval.

Skills

1. Is able to apply mathematical knowledge to formulation, analysis and solving of computing problems on medium level of difficulty.

2. Is able to obtain information using literature, knowledge bases, Internet and other credible sources, integrate and interpret it as well as draw conclusions and formulate opinions.

3. Is able to write, run and test programs in a chosen programming environment.

4. Ia able to program algorithms; to this end uses basic algorithmic techniques and data structures.

5. Is able to evaluate, on the basic level, the usefulness of routine programming techniques and tools as well as to chose and apply an appropriate ones.

6. Knows at least one foreign language on an intermediate level as well as English on the level that makes it possible to read and understand software documentation, handbooks and articles in the field of computer science.

Competences

1. Is aware of the necessity to systematically work on programming projects.

2. Understands and appreciates the significance of the intelectual honesty in own activites and activities of the others; is ethical.

3. Is able to work individually, in particular manages own time and keeps deadlines.

Assessment methods and assessment criteria:

The final score depends on the project and syllabus.

"Project" - 50%, "Syllabus" - 50% of the grade.

To pass, 60% from both the syllabus and the project is needed.

The syllabus: the attendance in the lectures (20%) and laboratories (80%). Thus, if there are 10 lectures and 10 laboratories, each lecture gives 2% of the syllabus i.e. 1% of the final grade. Moreover, each laboratory and homework (if any) is assessed and count for max of 8% of the syllabus and 4% of the final grade).

The project: the student(s) will need to collect and interpret the data and finally present it using appropriate plots (static, interactive, and animated). Both interactive (html, no size limit) and static (pdf, A0 format poster) formats are required.

Classes in period "Summer semester 2023/24" (in progress)

Time span: 2024-02-19 - 2024-06-16
Selected timetable range:
Navigate to timetable
Type of class:
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Łukasz Kozłowski
Group instructors: Łukasz Kozłowski
Students list: (inaccessible to you)
Examination: Examination

Classes in period "Summer semester 2024/25" (future)

Time span: 2025-02-17 - 2025-06-08
Selected timetable range:
Navigate to timetable
Type of class:
Lab, 30 hours more information
Lecture, 30 hours more information
Coordinators: Łukasz Kozłowski
Group instructors: Łukasz Kozłowski
Students list: (inaccessible to you)
Examination: Examination
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement USOSweb 7.0.3.0 (2024-03-22)