Introduction to natural language processing
General data
Course ID: | 3003-C3N-JK1 |
Erasmus code / ISCED: |
(unknown)
/
(0232) Literature and linguistics
|
Course title: | Introduction to natural language processing |
Name in Polish: | Wprowadzenie do przetwarzania języka naturalnego |
Organizational unit: | Institute of Polish Language |
Course groups: |
(in Polish) Konwersatoria do wyboru dla filologii polskiej - stacjonarne 2go stopnia 2023/2024 (in Polish) Konwersatoria do wyboru dla FP - stacjonarne 2. stopnia 2023/2024 - moduł nowoczesność (in Polish) Moduł "Nowoczesność" - filologia polska od cyklu 2019 - stacjonarne 2-go stopnia (in Polish) Wszystkie przedmioty polonistyczne - oferta ILP (3001...) , IJP (3003...) i IPS (3007...) |
ECTS credit allocation (and other scores): |
7.00
|
Language: | Polish |
Type of course: | obligatory courses |
Prerequisites (description): | The aim of the class will be to give participants a practical introduction to natural language processing concepts and methods. It is a practical and interdisciplinary field based on knowledge from linguistics, programming and machine learning, which has recently gained importance and recognition thanks to tools such as ChatGPT, among others. Natural language processing enables the automated analysis of collections of texts and the creation of artificial intelligence systems based on text data (search engines, chatbots, corpus tools, etc.). In this course, we will take a crash course in Python programming and learn about the practical side of packages that enable automated text analysis, including spaCy, StyloMetrix, BERTopic, and others, e.g. for creating statistical summaries and visualising the results obtained. |
Mode: | Classroom |
Short description: |
The aim of the course will be to give participants a practical introduction to natural language processing, computational linguistics and programming, in particular to process text corpora using the natural language processing techniques available in the Python programming language. Participants are not required to have prior knowledge of programming languages and programming skills. |
Full description: |
The aim of the course will be to give participants a practical introduction to natural language processing, computational linguistics and programming, in particular - processing text corpora using natural language processing techniques available in the Python programming language. Participants are not required to have prior knowledge of programming languages and programming skills, but are expected to have the motivation and commitment needed to acquire programming skills in natural language processing. Topics defining the scope of the course: 1. Basics of programming in Python: variable types, data structures, conditions and loops, functions and classes, working with files and using packages. 2. Application of Python for text data collection and processing (scraping, API querying, OCR and audio transcription). 3. SpaCy and different levels of linguistic annotation: morpho-syntactic analysis and tagging, dependency parsing. 4. Vector semantics and language models. 5. Models for sequence classification and token classification in spaCy. 6. Search through text using spaCy: rule-based and layer-based annotation search, semantic search. 7. Stylometric analysis of texts using StyloMetrix, pandas and scikit learn. 8. Topic modelling using BERTopic. 9. Visualisation of corpus processing results. |
Bibliography: |
Altinuk, D. (2021). Mastering spaCy: An end-to-end practical guide to implementing NLP applications using the Python ecosystem. Birmingham: Packt Publishing. Hobson, L., Cole, H., Hannes, H. (2021). Przetwarzanie języka naturalnego w akcji. Rozumienie, analiza i generowanie tekstu w Pythonie na przykładzie języka angielskiego. Warszawa: PWN. Mattingly, W. (2022). Introduction to Python for Digital Humanities, 2022, URL: www.python-textbook.pythonhumanities.com. Mattingly, W. (2021). Introduction to spaCy 3, URL: www.spacy.pythonhumanities.com. Sweigart, A. (2020). Automatyzacja nudnych zadań z Pythonem. Nauka programowania. Gliwice: Helion. |
Learning outcomes: |
Student - is familiar with the tools for text data processing and analysis available in the Python language - knows the basics of programming in Python and Python packages for text data processing and analysis - knows the most important concepts and techniques of natural language processing - is able to analyse a text data corpus with the use of Python language packages - is able to formulate a hypothesis concerning a text corpus and verify it using natural language processing techniques - is able to visualise the results of a text corpus analysis - is able to critically evaluate information on artificial intelligence systems based on text data - is able to understand the importance of natural language processing in solving both theoretical and practical problems and to apply the methods of this field to achieve their own research goals |
Assessment methods and assessment criteria: |
Attendance in class (two absences allowed). Regularly solving programming and natural language processing tasks. Completion of a small individual or group project using natural language processing methods. |
Classes in period "Summer semester 2023/24" (in progress)
Time span: | 2024-02-19 - 2024-06-16 |
Navigate to timetable
MO TU KON
W TH FR |
Type of class: |
Seminar, 30 hours, 15 places
|
|
Coordinators: | Marcin Będkowski, Iwona Burkacka | |
Group instructors: | Marcin Będkowski | |
Students list: | (inaccessible to you) | |
Examination: |
Course -
Grading
Seminar - Grading |
Copyright by University of Warsaw.