skip to content

Cambridge Language Sciences

Interdisciplinary Research Centre
JASA cover image

In a paper selected for the front cover of the Journal of the Acoustical Society of America (JASA) James Burridge and Bert Vaux present a new method for the low dimensional measurement of vowels using machine perception.

This method could eventually improve our ability to measure how the set of vowel sounds varies between different people, allowing us to better understand accents and speech disorders. 

Low dimensional sound measurements are commonly used in the scientific analysis of speech in order to reduce complex acoustic information and enable machine learning models to distinguish more easily between sounds. 

Current methods of generating these sound measurements are limited. They often require significant human intervention and are unable to capture the full complexity of sounds. This means they are less able to distinguish between subtle variations in sounds as found in different dialects and accents.

Bert Vaux, Professor of Phonology and Morphology at the University of Cambridge, and James Burridge, Reader in Probability and Statistical Physics at the University of Portsmouth, are investigating a method using neural networks and other machine learning models to capture more subtle variations in sound.

“Sounds are complex structures, thus quantitative representations which capture the full extent of this complexity are necessarily high dimensional… However, it is also useful to extract measurements which summarise its structure more simply.” say Burridge and Vaux.

“We want a single measurement method which can be systematically expanded to measure progressively more subtle variations in sound… [and] capture properties of sounds characterised by changes through time.” 

“Modern machine learning methods can take high dimensional representations and extract human-readable information, most notably orthographic representations (‘speech-to-text’ systems). We investigated if there is a way to use these methods to generate a low dimensional representation of sounds, understandable by humans, that is capable of accurately measuring any acoustic distinction in which we are interested” 

“Methods that automatically learn an optimal low dimensional representation have two advantages compared to traditional measures. First, there is no limit on the features that they can learn and the amount of complexity that they can capture. Second, once they have been trained, the map from high to low dimensional space is well defined, sidestepping the problems of making spectral measurements.” 

Bert Vaux and James Burridge received a Language Sciences Incubator Fund award in 2021 for their project 'Live voice vowel inference web app' with Michal Gnacik, Senior Lecturer in Mathematics and Physics, University of Portsmouth, at the time. 

In this project the team are developing a web application to automate the collection and analysis of a large corpus of spoken language, paired with social and geographical speaker information.

By pairing raw acoustic data with spatial-social data, the dataset can be used for modelling language learning and evolution and to understand how language changes are taking effect across the age and social spectra.

View the app prototype: Folkspeech

For more information about either of these projects please contact Bert Vaux or James Burridge.

What we do

Cambridge Language Sciences is an Interdisciplinary Research Centre at the University of Cambridge. Our virtual network connects researchers from five schools across the university as well as other world-leading research institutions. Our aim is to strengthen research collaborations and knowledge transfer across disciplines in order to address large-scale multi-disciplinary research challenges relating to language research.