
Submitted by Richard Arlett on Wed, 18/03/2026 - 14:47
We are delighted to announce that nine new projects have been awarded seed funding in the latest Incubator Fund round (running Nov 2025 - Feb 2026).
Posters on the funded research will be presented at future Cambridge Language Sciences Symposium events. We also hope to feature more information about these projects on the Language Sciences website in due course.
The following two projects are funded directly through the Language Science budget, supporting our core aim of strengthening research collaborations across disciplines.
A stimulus set for investigating the contribution of sensory systems to semantic representation
Dr Saskia Frisby, Tim Dick, Professor Matt Lambon Ralph (MRC Cognition & Brain Sciences Unit), Dr Alex Clarke (Warwick), Professor Tim Rogers (University College London)
This project will construct an invaluable stimulus set to investigate, for the first time, the role of both spokes and hub in semantic representation. Specifically, it aims to develop stimuli that are easily identifiable at the basic level (e.g. “tiger”, rather than “animal”), and that can be identified across four modalities: picture, written word, spoken word, and natural sound. It further aims to ensure that key psycholinguistic variables are matched across semantic categories (e.g. living vs. non-living), and that the unimodal similarity structures of the stimulus set vary orthogonally with each other and with the semantic similarity structure.
Discriminating Cues: Informativeness and Salience in L2 Learning of Morphological Case
Laura Barbenel (Computer Science & Technology), Professor John Williams (Theoretical and Applied Linguistics), Jesse Nixon (Linguistics, Bielefeld)
This project aims to understand how second language (L2) learners acquire morphological case contrasts through the lens of the discriminative learning framework. It focuses on two key factors shaping learning: the informativeness of morphological cues and their salience, particularly as influenced by a learner’s first language (L1). By examining when L2 learning follows error-driven predictions and when it diverges, the project seeks to determine how differences in cue salience affect the uptake of competing forms. Ultimately, it will provide new insights into the mechanisms underlying variability in L2 morphological acquisition.
Projects supported under our AI-deas project in collaboration with AI@Cam
These further seven projects are funded through our ongoing collaboration with AI@Cam, “Improving language equity and inclusion through AI.”
Using Machine Learning to assess cross-linguistic alignment to phonemes and words in monolingual and bilingual populations
Dr Suhail Matar (Theoretical and Applied Linguistics), Dr Mirjana Bozic (Psychology)
Compared to the brain of a monolingual, knowing two languages may place different demands on the bilingual brain, potentially leading to differences in the functioning of various neurocognitive systems. This project aims to investigate this further in two major ways. Firstly, beyond acoustic features, it will apply Machine Learning tools to examine and compare how monolingual and bilingual brains encode linguistic features at the level of phonemes and words. It will explore whether L1–L2 similarity in phonetic or lexical features modulates bilingual brain responses to those features.
xBLiMPs – Advancing Equitable Linguistic Capability Evaluation of Language Models Across Low-Resource Languages
Suchir Salhan (Computer Science & Technology), Catherine Arnett (EleutherAI), Professor Paula Buttery, Dr Andrew Caines (Computer Science & Technology)
This project will develop BLiMP-style datasets comprising approximately 100 minimal sentence pairs for at least ten grammatical phenomena, yielding 1000 test items per language. It will focus on Welsh, Catalan and/or Basque, Persian, and Tagalog. The datasets will be constructed using descriptive grammars and theoretical analyses in collaboration with syntacticians and field linguists specialising in each language, and validated through community-centred evaluation involving native speakers.
The phonetics and phonology of Llanito, Gibraltar’s dying language
Dr Mengjie Qian (Engineering), Professor Brechtje Post (Theoretical and Applied Linguistics)
This project aims to analyse the phonetics and phonology of Llanito, the traditional mother tongue of Gibraltar, as it is spoken today. To support this analysis, it will develop an AI-assisted workflow that accelerates expert-led annotation of low-resource speech data. Rather than attempting to build full speech recognition systems, the project adopts a human-in-the-loop approach in which existing multilingual and self-supervised speech models are used to generate target-language segmentations and detect phone boundaries that support manual phonetic analysis.
FigurativeAccess: Building Resources and Tools to Make Figurative Language More Accessible
Songqiao Xie (Theoretical and Applied Linguistics), Dr Xinxin Yan (Psychology & Language Sciences, UCL), Dr Mario Giulianelli (UCL), Professor Napoleon Katsos (Theoretical and Applied Linguistics), Dr Miloš Stanojević (Google Deep Mind)
Recent advances make it possible to develop AI-assisted tools that help users handle figurative language by detecting figurative uses in context, providing literal paraphrases or explanations, and generating apt figurative expressions from an intended meaning. Existing NLP research has made progress but limitations remain that prevent current models from being practically useful, largely due to a lack of high-quality resources. This project aims to close these gaps and lay the groundwork for building integrated systems that recognise, interpret, and generate figurative language.
The costs of faking vocal signals in the age of generative AI
Jonathan Goodman (Psychiatry), Dr Lucy MacGregor, Dr Lidea Shahidi (MRC Cognition and Brain Sciences Unit), Professor Robert Foley (Cambridge), Dr Kirsty McDougall (Theoretical and Applied Linguistics)
The rapid development of generative AI has introduced the possibility of synthetic voices that are increasingly indistinguishable from natural speech. Given the broad threat that AI fakes pose across the digital sphere, determining whether and how often people can detect technological mimicry of human social signals is critical. This project aims to evaluate the reliability of authenticity judgments in the context of accents, which have historically functioned as costly, hard-to-fake signals. Specifically, it will examine whether members of the public are similarly better than chance at detecting accent fakery using recordings generated by an AI agent, evaluate for regional variability in how well listeners distinguish genuine human voices from AI-generated clones, and whether detection aptitude overall interacts with accent familiarity.
Beyond Accuracy: Confidence-Aware Speech Models for Alzheimer’s Detection
Dr Mengjie Qian (Engineering), Dr Peter Raykov (MRC Cognition and Brain Sciences Unit), Professor Kate Knill (Computer Science & Technology)
Alzheimer’s Disease (AD) is the leading cause of dementia worldwide, with global costs exceeding one trillion US dollars and expected to double by 2030. Early detection is crucial for timely intervention, but few studies have explored uncertainty estimation in speech-based AD detection. This project aims to address this gap by developing developing confidence-aware speech models for AD. It will compare confidence estimation methods, e.g. temperature scaling, Dirichlet calibration, and confidence estimation networks, within SOTA classifiers. Introduce a risk-aware evaluation framework to quantify accuracy-confidence trade-offs and assess how performance changes when uncertain cases are withheld, and explore the potential of confidence scores to support interpretability.
Hearing Every Voice: Interpretable Voice Quality Labelling for Inclusive AI
Xiaojing Du (Theoretical and Applied Linguistics), Dr Mengjie Qian, Charles McGhee (Engineering), Dr Lidea Shahidi (MRC Cognition and Brain Sciences Unit), Chenzi Xu (Oxford, Nanyang)
Most listeners can tell when a voice sounds airy, squeezed or crackly, even if they cannot explain why. Voice quality is not cosmetic: it shapes judgements of identity, credibility and emotion, and it matters for diagnosing voice disorders, comparing speakers in forensic work and building speech technology for diverse talkers. Linguists, clinicians, and engineers all study the same signal but use different vocabularies and measurement traditions. This project tackles that shared problem and build a compact, interpretable phonation tagger that travels across speakers, rooms and disciplines.