endangered languages part 1

Endangered Languages Meet CAT Tools

Karolina Grzech has a PhD in Linguistics from SOAS University of London and specializes in endangered languages and, in particular, in endangered languages and lesser-spoken language of South America. In this blog series, Karolina and her colleague Dr. Anne Schwarz will share their experiences of documenting Tena Kichwa, a variety of Amazonian Kichwa, and how for the first time, a CAT tool has been used to help document an endangered language.

I translate for work on daily basis, and could hardly do my job without being able to do so. However, unlike most SDL bloggers, I am not a professional translator; I am a descriptive linguist and a language documenter. Now, what is it that I do, exactly?

I work on languages which have not yet been studied in much detail, trying to figure out their structure and describe it. Doing this most often requires fieldwork, which in my case has taken me to the Ecuadorian Amazon. The language I work on is called Amazonian Kichwa, and it belongs to the Quechuan language family, spoken along the Andes, from Chile and Argentina in the south to Colombia in the north. I have spent over a year in Ecuador over several field trips and an integral part of my work was creating a documentation of Kichwa: audio and video recordings of native speakers using the language, both in its natural setting, and in artificial dialogues or translation tasks, meant to help me figure out different aspects of the grammar, such as case marking and verbal and nominal morphology.

I’ve been working on Kichwa language since 2012, and for this project, I have joined forces with my colleague, Anne, who has previously worked on other languages in Ecuador and Ghana. In this series of blog posts, we will be taking you through our experience of applying CAT tools, and specifically SDL Trados Studio, in endangered language research. But before I tell you about our adventure with CAT tools, I should explain more about the nature of our work, and that’s precisely what today’s post is about.

Endangered languages

I mentioned before that Anne and I both specialise in ‘endangered languages’. When I say that to people who don’t work with languages, the first reaction I always get is that probably it’s Latin or other ‘dead languages’, or just ‘ancient languages’ in general. Well, nothing of that sort.

There are about 7000 languages spoken around the world today, but, according to estimates, 95% of the world’s population speak just 5% of those languages, including the ‘global languages’ such as Mandarin, English, or Spanish. On the other hand, the remaining 5% use the other 95% of languages. Many of those are spoken by just a handful of people, and of those, many are not taught to children anymore, making adults the last speakers. Such languages, which exist in all parts of the world might cease to be spoken – next month, next year, or even within a generation or two – are endangered languages. With each one that ceases to be spoken, we lose and unique insight into one of the ways in which we as a species can handle communication. This means that without knowing how these languages work, any theories we can formulate about the general properties of language will not have as good a grounding as they could once have had.

Fieldwork and documentation

In order to document such languages, and, in many cases, try to support and bring back their use, linguists work with communities of speakers. In many cases, this means extended periods of fieldwork in locations that are quite remote. The basis of such work is learning the language, unless the linguist is also a native speaker. We record people’s daily life, trying to get as wide a range of events as possible: from ceremonies, through political speeches, to less formal speech events and everyday conversation, which is the holy grail of documentation because of being the most natural context for language use. We and the native speakers then process these recordings, transcribing and translating them and establishing digital corpora, so as to be able to carry out linguistic analysis. This means really long hours in front of a screen, as transcribing one minute of speech can often take as long as an hour. And that’s before we even start to translate, so you can see why getting to use CAT tools in our work could really be an exciting prospect!

In the next post, we’ll tell you more about how we needed to prepare to start using SDL Trados Studio, what kind of issues we ran into and how we solved them, and about the upcoming trip to Ecuador, where present our first results.