Experimental semantics - corpus analysis module
General data
Course ID: | 3501-KOG-SE-MAK |
Erasmus code / ISCED: |
08.1
|
Course title: | Experimental semantics - corpus analysis module |
Name in Polish: | Semantyka eksperymentalna - moduł analizy korpusowej |
Organizational unit: | Institute of Philosophy |
Course groups: | |
ECTS credit allocation (and other scores): |
(not available)
|
Language: | Polish |
Type of course: | elective courses |
Mode: | Remote learning |
Short description: |
The main aim of the module is to introduce the students to available tools for analyzing text corpora. In experimental semantics data from the corpora can serve several functions. It can not only be an evidence for or agains certain semantic hypothesis, but also comprise a source of experimental materials or provide inspiration for new hypotheses. The students will learn about structure of typical text korpus and basic assumptions behind corpus linguistics. During the course basic association measures will be introduced (t, χ2 , MI, logDice) and how they can be used to test hypotheses about co-occurence of certain semantic units in real linguistic material. |
Full description: |
1. Corpora and search engines Lesson 1. Available English na Polish corpora - discussion on the structure of the following corpora: NKJP, BNC and COCA - text structure of the corpora (balance od types and sources of texts) - additional informations in corpora (metadata, tagsets etc.) - available search engines - practical aim: a student can search for words and phrases using different search engines for Polish and English corpora Lesson 2. Advanced features of search engines. Corpus Query Language - syntax of CQL - regular expressions - searching using metadata - practical aim: a student can construct complex search query using regular expressions and metadata 2. Collocations Lesson 1. Association measures - t-score - χ2 - Mutual Information - logDice - statistical hypothesis testing using association measures - practical aim: a student can, provided with frequency data of the words, calculate and interpret different association measures, she can understand practical and theoretical differences among them, she can also use them to statistically test hypotheses on co-occurence of particular semantic units Lesson 2. Association measures in corpus search engines - using corpus search engines to compute association statistics and extract frequency data - practical aim: a student can use available tools to compute association statistics, in addition to that, when certain measurment is not available, the student can extract frequency data and compute given statistic herself 3. SketchEngine Lesson 1. What is SketchEngine and what it can do for you? - discussion on corpora available in SketchEngine - searching and saving results of the queries - association measures available in SketchEngine - WordSketches - parallel corpora - practical aim: a student can use her knowledge and skills from previous lessons in work with SketchEngine 4. WordNets Lesson 1. WordNet and Słowosieć - structure of WordNet - different semantic relations between semantic units - using WordNets in combination with corpora - practical aim: a student can use informations from WordNets in her work with text corpora 5. Scripting your work with corpora (for volunteers) Lesson 1. Access to SketchEngine using Python - discussion on SketchEngine API - short introduction to JSON data format and simplejson Python library - practical aim: a student can access whole functionality of SketchEngine using Python |
Bibliography: |
(in Polish) - Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999). Cambridge: MIT press. - Davies, M. (2007). Semantically-based queries with a joint BNC/WordNet database. Language and Computers, 62(1), 149-167. - Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41. - Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International journal of lexicography, 3(4), 235-244. - Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). Itri-04-08 the sketch engine. Information Technology, 105, 116. - Lewandowska-Tomaszczyk, B., Bańko, M., Górski, R. L., Pęzik, P., & Przepiórkowski, A. (2012). Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN. - Maziarz, M., Piasecki, M., Rudnicka, E., & Szpakowicz, S. (2014). Plwordnet as the cornerstone of a toolkit of lexico-semantic resources. In Proceedings of the Seventh Global Wordnet Conference (pp. 304-312). - https://www.sketchengine.co.uk/user-guide/user-manual/word-sketch/ |
Learning outcomes: |
(in Polish) Nabyta wiedza: - student zna zasoby korpusowe dostępne w sieci - student zna różne rodzaje korpusów i wie, do jakich celów można je wykorzystać - student zna podstawowe pojęcia i terminologię z zakresu lingwistyki korpusowej - student zna wybrane narzędzia korpusowe i wie, jak je stosować Nabyte umiejętności: - student umie posługiwać się wybranymi narzędziami służącymi do analizy korpusów - student umie zanalizować wyniki danych korpusowych - student potrafi korzystać z wybranego środowiska pracy dedykowanego korpusom Nabyte kompetencje społeczne: - student potrafi współpracować w zespole badawczym, korzystając z narzędzi komunikacji cyfrowej |
Assessment methods and assessment criteria: |
For every week there will be an assignment for the students (7 assignments total, 10 points max for each assigment). Final mark depends only on succesfully completing assignments. 0-35 - 2 35-50 - 3 50-60 - 4 60-70 - 5 |
Copyright by University of Warsaw.