Cambridge Language Sciences Symposium for Early-Career Researchers

Join us online for talks and posters presenting work by early-career researchers. The Symposium is free, and open to all researchers at the University of Cambridge.

This year's event will take place online on Thursday 24 June. The Symposium posters and recordings of the keynote presentations will also be available to view on Cambridge Open Engage from 17 June during and after the event.

Registration is now closed.

Programme

All timings are British Summer Time (BST).

Session 1 (11:00-12:15)

11:00-12:15: Poster session

The following posters will be presented. The posters will also be available to view on Cambridge Open Engage from 17 June.

Designing and Building the Brazilian Spoken English Learner (BraSEL) Corpus, Mateus Souza (Faculty of Education)
An online course of Russian coronal obstruents for non-native speakers, Daria Dashkevich (Faculty of Modern & Medieval Languages & Linguistics) & Mathias Nowak (Institute of Astronomy, Cambridge)
Extending Parametric Comparison, James Baker (Theoretical & Applied Linguistics)
DeliData: A dataset for deliberation in multi-party problem solving, Georgi Karadzhov (Computer Science & Technology, Cambridge), Tom Stafford (Psychology, University of Sheffield), Andreas Vlachos (Computer Science & Technology, Cambridge)
Tone – Melody Matching in Chaozhou Songs: A Corpus Analysis, Xi Zhang & Ian Cross (Faculty of Music)
Can voice similarity be assessed using an automatic speaker recognition system? Linda Gerlach & Kirsty McDougall (Theoretical & Applied Linguistics), Finnian Kelly & Anil Alexander (Oxford Wave Research Ltd.)
Voice parade parameters: investigating the effect of parade size and voice sample duration on earwitness identification accuracy, Alice Paver (Theoretical & Applied Linguistics), Harriet J. Smith & Nikolas Pautz (Psychology, Nottingham Trent University), Kirsty McDougall (Theoretical & Applied Linguistics), Katrin Mueller-Johnson (Centre for Criminology, University of Oxford), Francis Nolan (Theoretical & Applied Linguistics)
Mental representation of Kunming Chinese tone sandhi: episodic or abstract?, Xiyuan Li (Theoretical & Applied Linguistics)

Session 2 (14:00-16:30)

14:00-14:15: Welcome

Chair: Guy Emerson, Department of Computer Science and Technology and Executive Director Cambridge Language Sciences

14:15-14:45: Can artificial intelligence save endangered languages? A brief history of learning

Ahmed Zaidi, Department of Computer Science and Technology

There is no doubt that learning new languages is infuriatingly difficult, especially at the later stages of life. As the world becomes "smaller" through globalisation, certain languages begin to increase in utility and start taking precedence over others, resulting in the extinction of the less "useful" languages. According to McWhorter (2009), in the next 100 years, the 6,000 languages in use today will be reduced to about 600. Whether and how to save these endangered languages is an important question plaguing the language sciences community.

We are now in the age of information and artificial intelligence. All the data we need is available in the palm of our hands. Mobile applications like Babbel and Duolingo lower the barriers to entry when it comes to learning a new language. So why is language learning still so difficult? Haven't the plethora of philosophical thought experiments, cognitive theories and neuroscience research combined with the scale and reach of modern technology enabled us to make language learning as easy and intuitive as playing a video game? Can we not use this technology to then increase the number of speakers for endangered languages?

The answers and further questions lie in the history of personalised learning and the underlying principles and paradigm shifts that have shaped it over the centuries. The nature of knowing represented through contemporary theories of learning such as behaviourism, cognitivism, and constructivism have provided some insight into the question of how we learn. In this talk, I will walk through a brief history of learning that will shed some light on not only the question of whether artificial intelligence can save endangered languages, but also whether it can play a role in making language learning less difficult.

READ MORE: Ahmed Zaidi speaker profile

14:45-15:15: Modelling semantic change from Ancient Greek to emoji

Barbara McGillivray, University of Cambridge Section of Theoretical and Applied Linguistics and The Alan Turing Institute

Over time, new words enter the language, others become obsolete, and existing words acquire new meanings. In ancient Greek, the meaning of the Persian loanword paradeisos expanded from ‘garden’ to the Jewish-Christian ‘paradise’ in the Greek translation of the Old Testament and in the New Testament. In Classical Latin passio meant ‘emotion’ and later on referred to the suffering and death of Christ and the martyrs. The English word chill originally meant ‘to cool’ and has metaphorically been extended to ‘to relax’. Follow only acquired the social media sense of staying informed about someone’s postings after the launch of Twitter.

The phenomenon of lexical semantic change, with its fascinating complexities grounded in cognitive, social and contextual factors, has important implications not just for linguistic theory and historical linguistics. It can shed new light into how we understand long-term and short-term changes in our cultural history and in our society. It is also a fundamental aspect of dictionary-making and it is important to keep automatic language processing systems up to date with the constant changes in language.

The recent digitization efforts have now made it possible to access and mine digital collections of historical texts using automatic methods and investigate the question of semantic change over centuries. Easy access to very large born-digital collections from the web also allows us to study changes in contemporary language data spanning short time periods.

In this talk I will present my research on developing models for semantic change drawing on state-of-the-art computational linguistics methods relying on distributional semantics principles, Bayesian learning and embedding technologies. I will share my experience of working at different scales and in a range of interdisciplinary projects, from Ancient Greek and Latin to Charles Darwin’s letters, web archives, Twitter and emoji.

15:15-15:45: Assessing psychosis risk using quantitative markers of transcribed speech

Sarah Morgan, University of Cambridge Department of Computer Science and Technology and The Alan Turing Institute

There is a pressing clinical demand for tools to predict individual patients' disease trajectories for schizophrenia and other conditions involving psychosis, however to date such tools have proved elusive.

Behaviourally and cognitively, psychosis expresses itself by subtle alterations in language. Recent work has suggested that Natural Language Processing markers of transcribed speech might be powerful predictors of later psychosis (Mota et al 2017, Corcoran et al 2018), for example, Corcoran et al 2018 used quantitative markers of semantic coherence collected at baseline from individuals at clinical high risk for psychosis, to predict transition to psychosis with 79% accuracy.

However, it remains unclear which NLP measures are most likely to be predictive, how different NLP measures relate to each other and how best to collect speech data from patients. In this talk, I will discuss our research tackling these questions, as well as the wider challenges of translating this type of approach to the clinic. Ultimately, computational markers of speech have the potential to transform healthcare of mental health conditions such as schizophrenia, since they are relatively easy to collect and could be measured longitudinally to quickly identify changes in patients' disease trajectories.

READ MORE: Sarah Morgan speaker profile

15:45-16:15: Reading in a second language: Influences of context, world knowledge, and structural complexity

Margreet Vogelzang, Section of Theoretical and Applied Linguistics

Many of us, including myself, are not native speakers of English. Nevertheless, we communicate, read, and write in English nearly every day. The level of proficiency achieved by some non-native speakers is impressive, but some challenges may still remain. My research focusses specifically on the skill of reading in a second language. Reading is a complex skill that requires processing of phonological, semantic, and syntactic information. In addition, information from different parts of a text needs to be stored and integrated.

Things become especially difficult when parts of a sentence are ambiguous, meaning that more than one interpretation is possible. For example, the prepositional phrase ‘with the binoculars’ in the syntactically ambiguous sentence in (1) can grammatically be attached to either the verb (high attachment) or the second noun (low attachment):

(1) The man saw the woman with the binoculars

This means that either the man or the woman could be holding the binoculars. Native English speakers generally prefer high attachment, i.e. they think that in sentence (1) the man is holding the binoculars. Nevertheless, it is known that both text-explicit information (the discourse context) and pragmatic information (world knowledge) can guide attachments in native English speakers. It is however largely unknown if and to what extent second language speakers are influenced by such information.

In this talk, I will present the results of a series of reading tasks investigating how native English speakers and native Spanish speakers that speak English as a second language interpret sentences such as (1). We manipulated structural complexity of the sentences preceding the critical sentence (the context), text-explicit information, and world knowledge to examine which of these factors affect PP attachment. Our results show that like native English speakers, second language speakers prefer high attachment, but interpretations are flexible and influenced by text-explicit and pragmatic information. However, we also found differences between how native English speakers and second language speakers read and interpret sentences such as (1); these differences will be discussed in the talk.

READ MORE: Margreet Vogelzang speaker profile

16:15-16:30: Closing remarks

Guy Emerson, Department of Computer Science and Technology and Executive Director Cambridge Language Sciences

Date:

Thursday, 24 June, 2021 - 11:00 to 17:00

Event location:

Online