Data Analysis in R
Informacje ogólne
Kod przedmiotu: | 2500-EN-F-233 |
Kod Erasmus / ISCED: |
14.4
|
Nazwa przedmiotu: | Data Analysis in R |
Jednostka: | Wydział Psychologii |
Grupy: |
Academic basket Elective courses electives for 3,4 and 5 year Methodology, Statistics and Psychometrics basket |
Punkty ECTS i inne: |
(brak)
|
Język prowadzenia: | angielski |
Skrócony opis: |
(tylko po angielsku) The course will give an introductory though very sound knowledge of the R environment for statistical programming. R is the most used, flexible, powerful and complete among statistical software. Moreover, it is free and hence easily accessible even to students at the beginning of their research work. It is customizable and if new statistical procedures emerge, good chances are that they are implemented in an R package. While some of the most widely used standard statistical analyses and data visualizations will be shortly presented, the focus of the course will be on data management and programming in R. |
Pełny opis: |
(tylko po angielsku) In psychology as in other fields, we see how technological advancements provide researchers with a growing quantity of collected data. Researchers are faced with the necessity of adding to the theoretical knowledge about their subject of study, the knowledge on how to deal and extract information from this increasingly available amount of data. Oftentimes though, we witness spectacular scientific advances, which are made possible by creatively connecting data with the scientific questions we are after. Simple, standard statistical knowledge and methods are often not enough anymore and the need for a new competence for the empirically minded researcher is emerging. This competence is taking the form of a new discipline in the quantitative sciences. called Data Science. Data science involves theoretically informed data management decisions and requires robust and customizable tools to perform these activities. In this context, R is emerging as THE standard statistical software for the next generations of analytically minded researchers. It is a flexible and powerful programming language and environment focused on data analysis. Although the core set of functions has more functionalities than you will ever want to use, R is an open source and freely available platform, which means everyone can contribute to it writing specialized packages that are made public to everyone. For that reason, it is also the most comprehensive statistical software and many innovative statistical methods are already available for your specialized needs. But not only that. R has very advanced and impressive graphical capacities which can produce publication quality data visualizations with just a few lines of code. The course will have an applied, hands-on approach and we will lead students from implementing their first simple operations on the data, to creating their own set of scripts and functions using the R language. The course is meant to be the first in a series of lectures specifically centered on R. Its focus will be on data management and programming, the basis of data science. For that reason both statistical modeling and data visualization won’t have much space and only a few basic statistical analyses and plotting functions will be covered. Advanced, specialized courses on these aspects of data science will be offered as standalone classes. |
Literatura: |
(tylko po angielsku) Bibliography: Introduction to R https://cran.r-project.org/doc/manuals/r-release/R-intro.html Lander, J., (2013) “R for everyone”. Addison-Wesley Peng, R.D., (2015) “R programming for Data Science”. Leanpub |
Efekty uczenia się: |
(tylko po angielsku) Students will be able to perform many basic but important operations over data within the R statistical environment: importing the data, understanding the basic data types, visualizing and summarizing the data. Students will be trained to go through all the steps needed to organize, restructure and clean data for successive statistical analysis. Students will learn how to write simple programs (scripts) in R in order to automatize recursive problems in data cleaning. |
Metody i kryteria oceniania: |
(tylko po angielsku) Most of the classes will start with a short (3-4 questions) quiz concerning the material presented in the previous class, and short polls to gauge students’ confidence and understanding of the current material will be administered and used to additionally tune the presentation of materials. Anyway, these quizzes won’t contribute to the final grade. Home assignments will contribute to the evaluation and progress made. There will be 5 home assignments during the course (approximately one every two/three classes; 30 points total). A final exam is envisioned during which students will solve one or two practical problems using most of the concepts treated in class in R (70 points total). For these reasons attendance is deemed essential – students are expected to attend ALL classes, be on time and prepared for discussion and activities. In general Home assignments will contribute to 30% of the final grade, and Final Exam for the remaining 70%. Grades will be assigned according to the following scale: 5 – 90-100% – outstanding performance 4+ – 79-89 4 – 73-78% – good performance 3+ – 67-72 3 – 60-66% – minimum passing performance 2 – 59% or less – performance not suitable for passing Attendance rules Attendance is a very important factor in order to pass the class. Up to two unexcused missed classes are allowed. Additional absences should be documented (e.g. sick leave). In case of exceptional and motivated situations I urge to contact me personally to evaluate if additional assignments can amend for the missed periods |
Właścicielem praw autorskich jest Uniwersytet Warszawski.