University of Warsaw - Central Authentication System
Strona główna

Text Mining

General data

Course ID: 2400-ZEWW330
Erasmus code / ISCED: 14.3 Kod klasyfikacyjny przedmiotu składa się z trzech do pięciu cyfr, przy czym trzy pierwsze oznaczają klasyfikację dziedziny wg. Listy kodów dziedzin obowiązującej w programie Socrates/Erasmus, czwarta (dotąd na ogół 0) – ewentualne uszczegółowienie informacji o dyscyplinie, piąta – stopień zaawansowania przedmiotu ustalony na podstawie roku studiów, dla którego przedmiot jest przeznaczony. / (0311) Economics The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Text Mining
Name in Polish: Analiza danych nieustrukturyzowanych ( ścieżka SAS)
Organizational unit: Faculty of Economic Sciences
Course groups: (in Polish) Przedmioty kierunkowe do wyboru - studia II stopnia IE - grupa 2 (2*30h)
(in Polish) Przedmioty wyboru kierunkowego dla studiów licencjackich IE
(in Polish) Przedmioty wyboru kierunkowego dla studiów licencjackich MSEM
ECTS credit allocation (and other scores): 3.00 Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: Polish
Type of course:

optional courses

Prerequisites (description):

(in Polish) Założenia wstępne

Podstawowa wiedza z zakresu informatyki


Short description:

The aim of the course is to familiarize students with statistical methods useful in the analysis of unstructured data and artificial intelligence techniques that enable structuring of textual information and improving analysis and methods of making decisions by analyzing the content of various text documents and finding unknown dependencies, patterns and

trends between data in the collected data sets. During the course, theoretical methods and practical examples will be discussed. Classes include conducting own analyzes by students using the SAS Enterprise Miner program and SAS Text Miner.

NOTE: Classes are conducted as a part of the DMCP path, and after completing the whole DMCP path students can obtain a SAS certificate.

Full description:

1. Introduction to methods of analyzing unstructured data. Techniques including Data Mining, Text Mining, Web Mining.

2. Functionality and tools of SAS Enterprise Miner.

3. Functionality and tools of SAS Text Miner.

4. Search methods for text information. Decomposition of text data. Quantitative representation of a set of documents.

5. Automatic processing of text data. Identification of keywords.

6. Stop list, start list. Canonical forms. Weighing functions. Frequency weights.

7. Transformation of text data. Reducing the size of the frequency matrix.

8. Data visualization. Creating a concept link tree.

9. Analysis of large document repositories. Using the %tmfilter macro in the text mining process.

10. Web content analysis. The use of the %tmfilter macro in the web mining process.

11. Clustering methods. Analysis of segment and cluster profiles.

12. Classification models. Scoring. Evaluation of the generated model.

13. Grouping of text data and prognostic modeling.

14. Predictions based on unordered text.

15. Cooperation with other SAS Enterprise Miner packages. Other Text Mining tools.

Bibliography:

Obligatory reading:

[1] Lasek M., Pęczkowski M., Enterprise Miner. Wykorzystywanie narzędzi Data Mining w systemie SAS.

[2] Lasek M., Data Mining. Zastosowania w analizach i ocenach klientów bankowych, Oficyna Wydawnicza „Zarządzanie i finanse”, Warszawa 2002.

[3] Witkowska D., Sztuczne sieci neuronowe i metody statystyczne. Wybrane zagadnienie finansowe, Wydawnictwo C.H. Beck, Warszawa 2002.

[4] Text Mining Using SAS Software, SAS Education.

Further reading:

[1] Frątczak E., Pęczkowski M., Sienkiewicz K., Skaskiewicz K., Statystyka od podstaw z systemem SAS, ISBN 83-7225-179-7, Oficyna Wydawnicza Szkoły Głównej Handlowej, Warszawa 2002.

[2] Giudici P., Applied Data Mining. Statistical Methods for Business and Industry, Wiley 2003.

[3] Hadasik D. (1998), Upadłość przedsiębiorstw w Polsce i metody jej prognozowania, Wydawnictwo Akademii Ekonomicznej w Poznaniu, Poznań.

[4] Jagielska J., Matthews Ch. Whitfort T. (1999), An investigation into the application of neural networks, fuzzy logic, genetic algorithms, and rough sets to automated knowledge acquisition for classification problems, Neurocomputing 24, 37-54.

[5] Jain L.B., Martin N.M. (eds.) (1999), Fusion of Neural Networks, Fuzzy Sets, and Genetic Algorithms. Industrial Applications, CRC Press.

[6] Kudyba S., Managing Data Mining. Advice from Experts, IT Solutions Series, ISBN 1-59140-243-3, CyberTech Publishing, Idea Group Inc. 2004.

[7] Nelles O. (2001), Nonlinear System Identification. From Classical Approaches to Neural Networks and Fuzzy Models, Springer Verlag, Berlin Heidelberg.

[8] Osowski S. (2001), Sieci neuronowe wykorzystujące systemy wnioskowania rozmytego, Software nr 2, 18-20 i 62.

[9] Raudys Š. (2001), Statistical and Neural Classifiers. An Integrated Approach to Design, Springer-Verlag, London.

[10]Ribeiro R., Zimmermann H.-J., Yager R., Kacprzyk J. (1999), Soft Computing in Financial Engineering, Studies in Fuzzines and Soft Computing, vol. 28, Physica Verlag, Heidelberg.

[11]Wang J. (ed.), Data Mining. Opportunities and Challenges, IRM Press 2003.

[12]Witten J.H., Frank E. (2000), Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations, Academic Press, Morgan Kaufmann Publishers.

[13]Zwierz U., Wstęp do systemu SAS, Oficyna Wydawnicza Szkoły Głównej Handlowej, Warszawa 2001.

[14]Data & Text Mining, wydawca Prentice Hall.

Learning outcomes:

Knowledge acquired through participation in the course: statistical methods useful in the analysis of unstructured data and their exemplary applications to find unknown relationships, patterns and trends between data in the collected data sets, as well as practical skills in using the SAS Enterprise Miner program and SAS Text Miner.

KW01, KW02, KW03, KU01, KU02, KU03, KK01, KK02, KK03

Assessment methods and assessment criteria:

Students are graded on the basis of a final project based on a self-designed and implemented model of text data analysis.

Classes in period "Summer semester 2023/24" (in progress)

Time span: 2024-02-19 - 2024-06-16
Selected timetable range:
Navigate to timetable
Type of class:
Seminar, 30 hours more information
Coordinators: Karolina Kuligowska
Group instructors: (unknown)
Students list: (inaccessible to you)
Examination: Course - Grading
Seminar - Grading
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement USOSweb 7.0.3.0 (2024-03-22)