Language Sciences Incubator Fund projects announced August 2021

Submitted by Jane Durkin on Tue, 03/08/2021 - 16:22

We are delighted to announce that six more projects have been awarded seed funding through the Language Sciences Incubator Fund.

Successful applicants from the latest call, which closed in June 2021, were announced on 3 August.

The fund is an opportunity for researchers to work together across disciplines on a pilot project, and can provide proof of concept or evidence of collaboration for larger grant applications.

We would like to congratulate all concerned, and to thank the reviewers and advisors for being generous with their time.

Posters on the funded research will be presented at future Cambridge Language Sciences Symposium events and further information about project outcomes will be published on the Incubator Fund page.

Using tone to predict code-switching in English-Vietnamese/Cantonese/Mandarin

Christopher Bryant (Dept. of Computer Science & Technology), Li Nguyen (Dept. of Computer Science & Technology and Theoretical & Applied Linguistics), Kayeon Yoo (Amazon Alexa Text-to-Speech Group and Theoretical & Applied Linguistics), Katrina Li (Theoretical & Applied Linguistics)

An innovative collaboration between linguistics and computer science researchers from industry and academia, this project aims to establish whether code-switching can be predicted based on tone, and whether this effect is consistent across three different tonal languages.

As global migration continues to grow and language-mixing is increasingly ubiquitous, studies on code-switching and multilingual approaches to NLP are more relevant than ever.

The findings from this study have potential to inform NLP tasks such as speech technology for mixed discourse, syntactic parsing, and automated translation as well as future research on code-switching and multilingualism in general.

Accents as honest signals of in-group membership

Jonathan R Goodman (Leverhulme Centre for Human Evolutionary Studies), Robert A Foley (Leverhulme Centre for Human Evolutionary Studies), Francis Nolan (Phonetics Lab), Emma Cohen (Social Body Lab, University of Oxford)

This project brings together researchers in phonetics, evolutionary biology and anthropology from Cambridge and Oxford Universities to investigate the evolution of communication and linguistic diversity. Building on a successful previous Incubator Fund project, it aims to show how linguistic boundaries may have developed among humans belonging to different social and kinship groups.

If accents were initially cues of ancestral heritage or group membership, we would expect impostors to attempt to mimic these cues for fitness benefit. This would lead to selection for more complex accents on the part of honest speakers, and we would expect honest signallers to develop an increasingly sensitive ear for accent mimicry.

The study will examine the hypothesis that local speakers will be superior to non-locals at detecting accent mimics and aims to determine further whether native accent-speakers are superior at mimicry detection to non-native listeners.

A linguistically-driven task on multi-modal spatial reasoning

Fangyu Liu (Language Technology Lab, Theoretical & Applied Linguistics), Guy Emerson (Dept. of Computer Science & Technology), Nigel Collier (Language Technology Lab, Theoretical & Applied Linguistics)

This project combines expertise in computer vision, theoretical linguistics and applied natural language processing (NLP) from across the University to produce a large-scale dataset to explore the similarity and differences of human and computational semantic models' perception of space in language.

This visual-linguistic dataset will enable researchers to investigate the semantics of spatial relations in English, and to evaluate how well such relations can be modelled by the current state of the art in machine learning.

This work is needed in order to build NLP systems that can effectively work with multiple modalities such as text, images and audio, and will also support future cross-linguistic and cross-cultural research opportunities.

Developing an Old English lemmatiser

Marieke Meelen (Theoretical & Applied Linguistics), Andrew Caines (Dept. of Computer Science & Technology and ALTA Institute)

In this project, researchers from linguistics and computer science will work together to produce the first fully lemmatised Old English corpus with reliable word embeddings.

Lemmatisation – the process of determining the dictionary form of a word (e.g. sing) given one of its inflected variants (e.g. sings, singing, sang, sung) – remains an unsolved problem for Old English because of the highly inflected nature of the language and the scarcity of data.

Word embeddings – a form of word representation that bridges the human understanding of language to that of a machine – are essential to investigate semantic changes or optimise state-of-the-art NLP tools.

Creating a properly lemmatised corpus will enable these essential word embeddings to be derived, and is an essential step for subsequent research into this period, and to enhance the teaching of English language and history.

Live voice vowel inference web app

Bert Vaux (Theoretical & Applied Linguistics); James Burridge & Michal Gnacik (School of Mathematics & Physics, University of Portsmouth)

This is a collaboration between researchers in linguistics, mathematics and statistical physics from Cambridge and Portsmouth Universities.

The team will develop a web application to automate the collection and analysis of a large corpus of spoken language, paired with social and geographical speaker information.

By pairing raw acoustic data with spatial-social data, the dataset can be used for modelling language learning and evolution and to understand how language changes are taking effect across the age and social spectra.

The systematic collection of speech data with meaningful demographics in this way is a recent and exciting opportunity that could also lead to useful insights and applications in a number of other fields such as machine learning.

Empirical evaluation of Graham's hierarchy of disagreement

Andreas Vlachos, Christine de Kock (Dept. of Computer Science & Technology); Tom Stafford (Dept. of Psychology, University of Sheffield)

This project speaks to emerging themes of online group identity and discussion and the role of argumentation in countering misinformation.

It combines expertise in psychology, linguistics and NLP to examine the role of language in conflict resolution in online contexts.

The team will use a corpus of language from Wikipedia Talk pages to investigate whether there are strategies for disagreement which are more likely to lead to the constructive resolution of a dispute.

The study will also support the exploration of constructive dispute resolution on other online collaborative platforms which increasingly play a key role in education, industry and cultural life.

About the Incubator Fund

The Incubator Fund, established by Cambridge Language Sciences with additional funding from the Isaac Newton Trust, Cambridge Assessment English, Cambridge University Press, and the School of Technology is a small grants fund designed to foster innovative interdisciplinary research in the language sciences.

Since the Incubator Fund was established in 2016, over £93,000 of seed funding has been awarded across 35 projects. As well as the opportunity to develop new ideas, collaborations and approaches, Incubator Fund projects can provide proof of concept or evidence of collaboration for larger grant applications. Other positive outcomes include knowledge exchange studentships, publications, fellowships and further career opportunities for researchers involved.

Please visit the Incubator Fund page for more information and to see a full list of Language Sciences Incubator Fund projects.