University of Warsaw - Central Authentication System
Strona główna

Corpus linguistics

General data

Course ID: 3223-LK-OG
Erasmus code / ISCED: 09.0 Kod klasyfikacyjny przedmiotu składa się z trzech do pięciu cyfr, przy czym trzy pierwsze oznaczają klasyfikację dziedziny wg. Listy kodów dziedzin obowiązującej w programie Socrates/Erasmus, czwarta (dotąd na ogół 0) – ewentualne uszczegółowienie informacji o dyscyplinie, piąta – stopień zaawansowania przedmiotu ustalony na podstawie roku studiów, dla którego przedmiot jest przeznaczony. / (0231) Language acquisition The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Corpus linguistics
Name in Polish: Lingwistyka korpusowa
Organizational unit: Institute of Specialised and Intercultural Communication
Course groups: General university courses
General university courses
General university courses in the humanities
ECTS credit allocation (and other scores): 2.00 OR 3.00 (differs over time) Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: Polish
Type of course:

general courses

Prerequisites (description):

The student should have basic knowledge of the Polish language, indicated knowledge of the basics of computer operation and software installation.

Mode:

Classroom

Short description:

The course is designed for students interested in improving their linguistic competence and recognizing, based on simple indicators, the style and intention of the creator of a given text. It can also be used to learn about the linguistic image of a given person/phenomenon/problem/stereotype in a communicative society or be useful in studying the quality of translated texts. It also develops competence in simple computer tools available on the web. Skills can also be applied to make research in other subjects more interesting or to strengthen the content of a thesis.

The preferred final project, prepared during the course of the course, can serve for further use. It is also possible to pass the subject in the form of a (partially) open test from the material presented in class and in the selected literature.

Selected programs for text analysis, creation and evaluation of text corpora, stylometric programs, and electronic text corpora are discussed in class.

Full description:

The course program includes theoretical news on the construction of language corpora, techniques of collecting language data and the possibilities of their practical use:

1. The concept of a language corpus. Language corpus vs. collection of texts. Theoretical vs. material research: the role of linguistic data in social sciences, marketing, media and linguistics.

2. Typology of corpora: monolingual and multilingual, parallel and comparative corpora. The concept of representativeness and adequacy of corpora.

3. Basic information on indexing of language corpora; skills of interpretation of the given data.

4. Basic corpora of the Polish language. NKJP corpus with accompanying tools.

5. Available tools for analysis of text corpora (AntConc, Jasnopis, etc.).

6. Parametric text analysis tools, practical applications of text modality measurement.

7. Possible applications of corpora in practice:

(a) study of specialized languages

b) corpus as a tool to assist the translator

c) foreign language didactics

d) dictionaries and dictionary models on various data carriers

8. text corpora and parallel texts in working with translation support programs (CAT tools).

The purpose of the course is to familiarize the student with basic computer tools that allow us to learn more about the text we need, statistically and stylistically analyze its content and use authoritative data for our own (research) purposes. After the course, the student should be able to use basic programs in digital linguistics and search for the necessary data using the tools learned. The course will be conducted with the use of presentations and visualization of the operation of individual programs, as available hardware capabilities. In addition, the course assumes gaining the ability to derive individual conclusions from the linguistic analyses conducted and their visualization. The acquired knowledge is to be used for the creation by the student (group of students) of a formally defined in class project of a text corpus, forming the basis of the credit (the project or the acquired skills can be applied to make research in other subjects more attractive or to strengthen the content of the thesis). Alternatively, the subject can be passed by demonstrating subject knowledge on a credit test of the course content and literature.

Student workload (3 ECTS):

30 hours classroom attendance (1)

30 hours preparation of text corpus or alternative studies (1)

15 hours of own work with software (0.5)

15 hours reading and preparing for class, preparing for the exam/project discussion (0.5)

Bibliography:

see in Polish

Learning outcomes:

The student, in terms of the following criteria, improves his qualifications:

is familiar with the terminology used in linguistics and related fields at an extended level, is oriented in the most important directions and methods of linguistic research; understands grammatical terminology; has knowledge of selected pragmatic conditions of the given language systems;

has in-depth knowledge of the methodology and conduct of linguistic or literary research; is familiar with scientific style and scientific lexis; has knowledge of databases for linguistics, has basic knowledge of the interpretation of data obtained from analysis;

is familiar with popular computer-aided translation (CAT) programs and selected programs for frequency and stylistic analysis; knows the possibilities of using machine translation;

Skills:

uses computer programs useful in translator's work, is able to properly format text in Polish, and at least one in a foreign language; proficiently uses spreadsheets and charts; is able to use generally available scientific databases (including terminology and corpus databases); proficiently searches for information, uses expert knowledge, encyclopedic, linguistic, general-scientific, general-technical, interdisciplinary and industry-specific terminology dictionaries, language corpora, databases, parallel texts;

is able to identify gaps in scientific research and directions for its continuation; formulates research problems, selects adequate methods, constructs research tools, develops, presents and interprets research results, draws conclusions;

Social competencies:

is able to supplement and improve the acquired knowledge of at least one language of the specialty and his own language; is aware of the need to constantly search for new dictionary and textual sources, as well as to follow contemporary emerging scientific theories; reacts quickly to the changing reality;

draws conclusions from feedback, knows how to manage time; maintains contact with the translation community, works in a multicultural environment; is familiar with the translator's work environment;

is able to work in a group, cooperate with others, assuming appropriate roles (functions); lead a small team (3-4 people in practice groups);

Assessment methods and assessment criteria:

Methods of evaluation of student work

- Evaluation of activity and current preparation for classes;

- Project (thematic text corpus or alternative);

- final written credit (test or stylometric analysis of the corpus).

Assessment criteria (components of the final assessment):

- continuous evaluation from classes: 10%

- project (thematic text corpus): 40%

- final credit: 50%

Examination (final credit):

In the case of a mixed test (multiple and/or single-choice), the number of points obtained is binding.

In the case of preparation of a project of analysis of selected text corpora, the number of points obtained for each part of the study is binding.

Scoring rules for the current assessment and exam/assessment:

55%-69% = 3

70%-74% = 3+

75%-84% = 4

85%-89% = 4+

90%-100% = 5

Rules of cooperation of the instructor with students:

1. absences - allowed 3 unexcused absences per semester (this is in accordance with the regulations).

2.The final credit can be taken after passing the project (assignment) and receiving a positive evaluation of continuous classes.

(3) The student has the right to improve each written test twice. Failure to take the test on the first date without an excuse will result in the loss of the date.

Classes in period "Winter semester 2023/24" (past)

Time span: 2023-10-01 - 2024-01-28
Selected timetable range:
Navigate to timetable
Type of class:
Seminar, 30 hours, 16 places more information
Coordinators: Łukasz Karpiński
Group instructors: Łukasz Karpiński
Students list: (inaccessible to you)
Examination: Course - Grading
Seminar - Grading
Short description: (in Polish)

Przedmiot przeznaczony jest dla studentów zainteresowanych podniesieniem swoich kompetencji językowych oraz rozpoznawaniem na podstawie prostych wskaźników stylu i intencji twórcy danego tekstu. Może także służyć poznaniu językowego obrazu danej osoby/ zjawiska/ problemu/stereotypu w społeczeństwie komunikacyjnym lub przydać się w zakresie badania tłumaczeń tekstów z/na języki obcych. Rozwija także kompetencje w zakresie prostych narzędzi informatycznych, dostępnych w sieci. Nabyte umiejętności można zastosować dla uatrakcyjnienia badań z innych przedmiotów lub wzmocnienia treści pracy dyplomowej. Celem zajęć jest zapoznanie z podstawowymi narzędziami informatycznymi wspierającymi tworzenie ukierunkowanych korpusów tekstowych badających zadany problem. Na zajęciach omawiane są wybrane programy do analizy tekstów, tworzenia i oceny korpusów tekstowych, programy stylometryczne, i elektroniczne korpusy tekstowe. Nabyta wiedza ma posłużyć do stworzenia przez studenta (lub grupę studentów) określonego projektu korpusowego.

Full description: (in Polish)

Zgodnie z opisem nadrzędnym.

Bibliography: (in Polish)

Zgodnie z opisem nadrzędnym.

Notes: (in Polish)

W przypadku braku możliwości prowadzenia zajęć w formie stacjonarnej zajęcia będą odbywać się przy użyciu narzędzi komunikacji na odległość, najprawdopodobniej Google Meet oraz innych zalecanych przez UW.

Classes in period "Summer semester 2024/25" (future)

Time span: 2025-02-17 - 2025-06-08
Selected timetable range:
Navigate to timetable
Type of class:
Seminar, 30 hours, 16 places more information
Coordinators: Łukasz Karpiński
Group instructors: Łukasz Karpiński
Students list: (inaccessible to you)
Examination: Course - Grading
Seminar - Grading
Short description: (in Polish)

Przedmiot przeznaczony jest dla studentów zainteresowanych podniesieniem swoich kompetencji językowych oraz rozpoznawaniem na podstawie prostych wskaźników stylu i intencji twórcy danego tekstu. Może także służyć poznaniu językowego obrazu danej osoby/ zjawiska/ problemu/stereotypu w społeczeństwie komunikacyjnym lub przydać się w zakresie badania tłumaczeń tekstów z/na języki obcych. Rozwija także kompetencje w zakresie prostych narzędzi informatycznych, dostępnych w sieci. Nabyte umiejętności można zastosować dla uatrakcyjnienia badań z innych przedmiotów lub wzmocnienia treści pracy dyplomowej. Celem zajęć jest zapoznanie z podstawowymi narzędziami informatycznymi wspierającymi tworzenie ukierunkowanych korpusów tekstowych badających zadany problem. Na zajęciach omawiane są wybrane programy do analizy tekstów, tworzenia i oceny korpusów tekstowych, programy stylometryczne, i elektroniczne korpusy tekstowe. Nabyta wiedza ma posłużyć do stworzenia przez studenta (lub grupę studentów) określonego projektu korpusowego.

Full description: (in Polish)

Zgodnie z opisem nadrzędnym.

Bibliography: (in Polish)

Zgodnie z opisem nadrzędnym.

Notes: (in Polish)

W przypadku braku możliwości prowadzenia zajęć w formie stacjonarnej zajęcia będą odbywać się przy użyciu narzędzi komunikacji na odległość, najprawdopodobniej Google Meet oraz innych zalecanych przez UW.

Course descriptions are protected by copyright.
Copyright by University of Warsaw.
Krakowskie Przedmieście 26/28
00-927 Warszawa
tel: +48 22 55 20 000 https://uw.edu.pl/
contact accessibility statement USOSweb 7.0.3.0 (2024-03-22)