skip to content

Cambridge Language Sciences

Interdisciplinary Research Centre
Photo of cherry blossom Early Career Researchers Symposium June 2021

This year’s Language Sciences Symposium for Early-Career Researchers took place on Thursday 24 June.

The event, now in its fourth year, celebrates work by early-career researchers working in the field of language sciences.

We were delighted to welcome keynote presenters from across the University: Ahmed Zaidi, Computer Science and Technology; Barbara McGillivray, Theoretical and Applied Linguistics and The Alan Turing Institute;  Sarah Morgan, Computer Science and Technology and The Alan Turing Institute; and Margreet Vogelzang, Theoretical and Applied Linguistics.

The programme was organised and chaired by Guy Emerson, Academic Fellow Department of Computer Science and Technology and Executive Director Cambridge Language Sciences.

He said "The Symposium was not only a showcase of research across Cambridge, but also an opportunity to start a conversation between researchers in different disciplines."

Talks explored topics as wide-ranging as artificial intelligence and endangered languages, machine-learning models for historical and contemporary semantic change, using speech data to predict mental health, and language processing in relation to reading in a second language.

The posters also showcased language-sciences research across a diverse range of subjects including Education, Music and Criminology, as well as some innovative interdisciplinary collaborations between Linguistics and Astronomy, and Psychology and Computer Science.

The Symposium posters and recordings of the keynote presentations remain available to view on Cambridge Open Engage, an early research site run by Cambridge University Press.

Cambridge Language Sciences currently runs two Symposia a year: a Symposium for Early-Career Researchers in the summer and an Annual Symposium in November for all members of our research community. Details of the next Symposium in November 2021 will be available soon.

Symposium Programme: June 2021

Can artificial intelligence save endangered languages? A brief history of learning

Ahmed Zaidi, Department of Computer Science and Technology 

There is no doubt that learning new languages is infuriatingly difficult, especially at the later stages of life. As the world becomes "smaller" through globalisation, certain languages begin to increase in utility and start taking precedence over others, resulting in the extinction of the less "useful" languages. According to McWhorter (2009), in the next 100 years, the 6,000 languages in use today will be reduced to about 600. Whether and how to save these endangered languages is an important question plaguing the language sciences community.

We are now in the age of information and artificial intelligence. All the data we need is available in the palm of our hands. Mobile applications like Babbel and Duolingo lower the barriers to entry when it comes to learning a new language. So why is language learning still so difficult? Haven't the plethora of philosophical thought experiments, cognitive theories and neuroscience research combined with the scale and reach of modern technology enabled us to make language learning as easy and intuitive as playing a video game? Can we not use this technology to then increase the number of speakers for endangered languages?

The answers and further questions lie in the history of personalised learning and the underlying principles and paradigm shifts that have shaped it over the centuries. The nature of knowing represented through contemporary theories of learning such as behaviourism, cognitivism, and constructivism have provided some insight into the question of how we learn. In this talk, I will walk through a brief history of learning that will shed some light on not only the question of whether artificial intelligence can save endangered languages, but also whether it can play a role in making language learning less difficult.

READ MORE: Ahmed Zaidi speaker profile

WATCH TALK: Ahmed Zaidi presentation

Modelling semantic change from Ancient Greek to emoji

Barbara McGillivray, University of Cambridge Section of Theoretical and Applied Linguistics and The Alan Turing Institute 

Over time, new words enter the language, others become obsolete, and existing words acquire new meanings. In ancient Greek, the meaning of the Persian loanword paradeisos expanded from ‘garden’ to the Jewish-Christian ‘paradise’ in the Greek translation of the Old Testament and in the New Testament. In Classical Latin passio meant ‘emotion’ and later on referred to the suffering and death of Christ and the martyrs. The English word chill originally meant ‘to cool’ and has metaphorically been extended to ‘to relax’. Follow only acquired the social media sense of staying informed about someone’s postings after the launch of Twitter.

The phenomenon of lexical semantic change, with its fascinating complexities grounded in cognitive, social and contextual factors, has important implications not just for linguistic theory and historical linguistics. It can shed new light into how we understand long-term and short-term changes in our cultural history and in our society. It is also a fundamental aspect of dictionary-making and it is important to keep automatic language processing systems up to date with the constant changes in language.

The recent digitization efforts have now made it possible to access and mine digital collections of historical texts using automatic methods and investigate the question of semantic change over centuries. Easy access to very large born-digital collections from the web also allows us to study changes in contemporary language data spanning short time periods.

In this talk I will present my research on developing models for semantic change drawing on state-of-the-art computational linguistics methods relying on distributional semantics principles, Bayesian learning and embedding technologies. I will share my experience of working at different scales and in a range of interdisciplinary projects, from Ancient Greek and Latin to Charles Darwin’s letters, web archives, Twitter and emoji.

READ MORE: Barbara McGillivray speaker profile

WATCH TALK: Barbara McGillivray presentation

Assessing psychosis risk using quantitative markers of transcribed speech

Sarah Morgan, University of Cambridge Department of Computer Science and Technology and The Alan Turing Institute  

There is a pressing clinical demand for tools to predict individual patients' disease trajectories for schizophrenia and other conditions involving psychosis, however to date such tools have proved elusive.

Behaviourally and cognitively, psychosis expresses itself by subtle alterations in language. Recent work has suggested that Natural Language Processing markers of transcribed speech might be powerful predictors of later psychosis (Mota et al 2017, Corcoran et al 2018), for example, Corcoran et al 2018 used quantitative markers of semantic coherence collected at baseline from individuals at clinical high risk for psychosis, to predict transition to psychosis with 79% accuracy.

However, it remains unclear which NLP measures are most likely to be predictive, how different NLP measures relate to each other and how best to collect speech data from patients. In this talk, I will discuss our research tackling these questions, as well as the wider challenges of translating this type of approach to the clinic. Ultimately, computational markers of speech have the potential to transform healthcare of mental health conditions such as schizophrenia, since they are relatively easy to collect and could be measured longitudinally to quickly identify changes in patients' disease trajectories.

READ MORE: Sarah Morgan speaker profile

WATCH TALK: Sarah Morgan presentation

Reading in a second language: Influences of context, world knowledge, and structural complexity

Margreet Vogelzang, Section of Theoretical and Applied Linguistics

Many of us, including myself, are not native speakers of English. Nevertheless, we communicate, read, and write in English nearly every day. The level of proficiency achieved by some non-native speakers is impressive, but some challenges may still remain. My research focusses specifically on the skill of reading in a second language. Reading is a complex skill that requires processing of phonological, semantic, and syntactic information. In addition, information from different parts of a text needs to be stored and integrated.

Things become especially difficult when parts of a sentence are ambiguous, meaning that more than one interpretation is possible. For example, the prepositional phrase ‘with the binoculars’ in the syntactically ambiguous sentence in (1) can grammatically be attached to either the verb (high attachment) or the second noun (low attachment):

(1) The man saw the woman with the binoculars

This means that either the man or the woman could be holding the binoculars. Native English speakers generally prefer high attachment, i.e. they think that in sentence (1) the man is holding the binoculars. Nevertheless, it is known that both text-explicit information (the discourse context) and pragmatic information (world knowledge) can guide attachments in native English speakers. It is however largely unknown if and to what extent second language speakers are influenced by such information.

In this talk, I will present the results of a series of reading tasks investigating how native English speakers and native Spanish speakers that speak English as a second language interpret sentences such as (1). We manipulated structural complexity of the sentences preceding the critical sentence (the context), text-explicit information, and world knowledge to examine which of these factors affect PP attachment. Our results show that like native English speakers, second language speakers prefer high attachment, but interpretations are flexible and influenced by text-explicit and pragmatic information. However, we also found differences between how native English speakers and second language speakers read and interpret sentences such as (1); these differences will be discussed in the talk.

READ MORE: Margreet Vogelzang speaker profile

WATCH TALK: Margreet Vogelzang presentation


  • Designing and Building the Brazilian Spoken English Learner (BraSEL) Corpus, Mateus Souza (Faculty of Education)
  • An online course of Russian coronal obstruents for non-native speakers, Daria Dashkevich (Faculty of Modern & Medieval Languages & Linguistics) & Mathias Nowak (Institute of Astronomy, Cambridge)
  • Extending Parametric Comparison, James Baker (Theoretical & Applied Linguistics)
  • DeliData: A dataset for deliberation in multi-party problem solving, Georgi Karadzhov (Computer Science & Technology, Cambridge), Tom Stafford (Psychology, University of Sheffield), Andreas Vlachos (Computer Science & Technology, Cambridge)
  • Tone – Melody Matching in Chaozhou Songs: A Corpus Analysis, Xi Zhang & Ian Cross (Faculty of Music)
  • Can voice similarity be assessed using an automatic speaker recognition system? Linda Gerlach & Kirsty McDougall (Theoretical & Applied Linguistics), Finnian Kelly & Anil Alexander (Oxford Wave Research Ltd.)
  • Voice parade parameters: investigating the effect of parade size and voice sample duration on earwitness identification accuracy, Alice Paver (Theoretical & Applied Linguistics), Harriet J. Smith & Nikolas Pautz (Psychology, Nottingham Trent University), Kirsty McDougall (Theoretical & Applied Linguistics), Katrin Mueller-Johnson (Centre for Criminology, University of Oxford), Francis Nolan (Theoretical & Applied Linguistics)
  • Mental representation of Kunming Chinese tone sandhi: episodic or abstract?, Xiyuan Li (Theoretical & Applied Linguistics) 

What we do

Cambridge Language Sciences is an Interdisciplinary Research Centre at the University of Cambridge. Our virtual network connects researchers from five schools across the university as well as other world-leading research institutions. Our aim is to strengthen research collaborations and knowledge transfer across disciplines in order to address large-scale multi-disciplinary research challenges relating to language research.