skip to content

Cambridge Language Sciences

Interdisciplinary Research Centre
Language Sciences Incubator Fund banner_883x431px

We are delighted to announce that two new projects have been awarded seed funding through the Language Sciences Incubator Fund.

The successful applicants from the latest call were announced on 29 March 2023.

We would like to congratulate all concerned and thank the reviewers for giving their time to evaluate the proposals. 

Posters on the funded research will be presented at future Cambridge Language Sciences Symposium events. We also hope to feature more information about these projects on the Language Sciences website in due course.

Developing a Psychologically Grounded Corpus for Measuring Sensationalism in the News 

Tiancheng Hu, Kiran Garimella and Nigel Collier

This collaboration between researchers at Cambridge and Rutgers Universities aims to help understand the dynamics of sensationalism in the media. 

The project bridges the fields of natural language processing (NLP), journalism studies, communication and psychology.

It aims to address important questions around declining trust in the news industry, such as: Is there more sensationalism in the news in recent years? How does this differ in different outlets, for example, tabloid versus non-tabloid, or local versus national?

The team will create a corpus for measuring sensationalism in news headlines and train a machine learning model to quantify it.

This is also an opportunity to explore the use of language models for modelling subjective social science concepts and affective reasoning tasks. 

Methods for Evaluating Short-cut Learning in Transformer-based Automatic Speech Recognition (ASR) Systems and its implications

Calbert Graham, Konstantinos Voudouris, Luca Scimeca and Nathan Roll

This is a new collaboration between Cambridge researchers in Computational linguistics and Psychology, with Harvard University and UC Santa Barbara. 

The project addresses the challenges of accent diversity in Automatic Speech Recognition (ASR) systems. 

The aim of the project is to better understand how speech cues that discriminate between speakers and their accent backgrounds are learned in deep neural networks. 

Drawing on research in phonetics, computer science, speech recognition, phonology and second language acquisition, the group are building a framework to explore algorithm bias in ASR systems. 

This will be based on the Accent Benchmark dataset – a corpus of over 5,000 audio clips of native and non-native English speech developed for the previous Incubator Fund project, ‘Generalising Native Language Articulation to Non-Native Contexts’.

There is growing evidence that ASR systems can exhibit biases that may amplify discrimination based on gender, race, ethnicity, health condition, and so on. 

Research has found vast disparities in word error rates calculated for black speakers of American English compared to white speakers. Similarly, recognition rates of non-native speech are generally poorer. 

This may be due to speech recognisers being trained on predominantly native, standard varieties of English and other languages. 

However, bias is not limited to spoken language. For example, similar problems have been uncovered in leading Twitter AI hate-speech detection algorithms in which the model was unaware of certain dialectal forms and was 1.5 times more likely to flag tweets as offensive or hateful when written by Black Americans.

Dr Calbert Graham, Senior Research Associate at Cambridge University said, “The issue is not simply about whether a computer is good at recognising the words we speak. ASR is embedded in many important systems to automate processes (e.g., to provide remote access to health and education services), and to improve accessibility for people with disabilities.”

“Many people rely on personal voice assistants to complete daily tasks. There is clearly an urgent need in safety-critical contexts to assess how these ASR algorithms work and the nature of the errors they generate. Making the inner workings of the technology more transparent would help to build trust among users that the algorithms work the way they are intended.”

About the Incubator Fund

The Incubator Fund is a small grants fund designed to foster innovative interdisciplinary research in the language sciences. It was established by Cambridge Language Sciences with additional funding from the Isaac Newton Trust, Cambridge University Press & Assessment, and the School of Technology.

As well as the opportunity to develop new ideas, collaborations and approaches, Incubator Fund projects can provide proof of concept or evidence of collaboration for larger grant applications. Other positive outcomes include knowledge exchange studentships, publications, fellowships and further career opportunities for researchers involved.

Since the Incubator Fund was established in 2016, over £110,000 of seed funding has been awarded across 41 projects.

We hope to open the next call for proposals in Autumn 2023.

Please visit the Incubator Fund page for more information and to see a full list of Language Sciences Incubator Fund projects.

What we do

Cambridge Language Sciences is an Interdisciplinary Research Centre at the University of Cambridge. Our virtual network connects researchers from five schools across the university as well as other world-leading research institutions. Our aim is to strengthen research collaborations and knowledge transfer across disciplines in order to address large-scale multi-disciplinary research challenges relating to language research.