Cambridge Language Sciences Annual Symposium 2020: What Next? Future Directions in Language Research

The Cambridge Language Sciences Annual Symposium is a meeting of minds, bringing together language scientists of all disciplines from the University of Cambridge and beyond.

The 2020 meeting took place on 17 November. We are excited to be hosting the Annual Symposium posters, along with recordings of the plenary sessions, on Cambridge Open Engage, an early research site run by Cambridge University Press. The site offers a place for delegates to access and discuss the talks and posters, and will allow a wider research audience to reach and interact with the content.

Programme

Session 1

Chair: Prof. Brechtje Post

10:30-11:15 L1 Identification from L2 Speech Using Neural Spectrogram Analysis

Calbert Graham, Phonetics Laboratory, University of Cambridge

English has become the most widely spoken language globally with the vast majority of its speakers using it as a second language (L2). It is well-known that the characteristic features of these different varieties of English are highly influenced by the speakers’ native languages (L1s). Understanding the speech features that contribute to the foreign-accentedness of a speaker’s L2 English may be useful in foreign language learning (e.g. in pronunciation remediation systems) and in forensic speaker profiling (e.g. by helping an investigator to narrow down the scope of an investigation).

The main objective of this project is to model L1-L2 interaction and uncover discriminative speech features that can identify the L1 background of a speaker from their non-native English speech. In modelling L1-L2 interaction, traditional phonetic analyses tend to measure the similarity of an L2 speaker’s production (of specific phonemes or prosodic units) as compared to that of a native speaker, based on a pre-selected set of acoustic features. However, apart from being time and expertise consuming, the set of extracted features may not be sufficient to capture all the traces of the L1 in the L2 speech that are needed to make an accurate classification. Deep learning has the potential to address this issue by exploring the space of features automatically.

In this talk I will report a series of classification experiments involving a deep convolutional neural network (CNN) based on spectrogram pictures. The classification problem consists of determining whether English speech samples from a large spontaneous speech corpus are spoken by a native speaker of SSBE, Japanese, Dutch, French or Polish.

The input to the CNN are spectrogram images extracted from 30-second speech samples. In order to make the features more transparent and therefore interpretable by phoneticians, the experiment also compares accuracy rates in training the classifiers on (1) spectrogram pictures of phonetically segmented vocalic, consonantal and inter-segmental intervals vs. on (2) spectrogram pictures without any explicit phonetic segmentation (i.e. extracted at fixed time intervals).

Overall, results showed that the system can identify the 5 English varieties with a high level of accuracy based on spectrogram pictures. Findings also suggest that although spectrogram images without phonetic segmentation have the highest level of accuracy in the experiments, training the classifiers on certain combinations of phonetically modelled spectrogram images can produce results with comparable accuracy rates.

Our preliminary conclusions are that:

Spectrogram images contain a wide of range of information to successfully trace the L1 background of speakers when they speak in their L2, which makes this approach superior to traditional feature-extraction methods.
Unlike traditional phonetic approaches, deep learning based on spectrogram images lacks transparency and is therefore difficult to interpret.
However, an integrative approach that combines deep learning with phonetic modelling (to make the source of the discriminating features more transparent) can potentially be very useful in phonetic research.

11:15-12.00 Tudor Networks of Power

Sebastian Ahnert, Dept. of Chemical Engineering & Biotechnology; Alan Turing Institute

The digitisation of historical archives provides the opportunity to interrogate historical sources from entirely new vantage points. We describe here the curation and analysis of a large historical correspondence network derived from the Tudor State Papers, spanning almost 100 years, from 1509 to 1603. The network connects 22,000 individuals, who sent 130,000 letters across this period. We often know the exact day on which a letter was written, as well as 5,000 geolocations from which the letters were sent, spanning Europe, the Americas, and Asia, giving a fascinating insight into Early Modern mobility. In addition we have detailed information on the letter contents - machine-readable synopses of the letters in digital form, and images of the original letter manuscripts. The historical nature of the data means that extensive disambiguation and de-duplication of all person identities and place names was undertaken. Using combinations of different network measures to create network signatures we are able to identify different roles that individuals played in this network. These findings are then further contextualised through close-reading of the letter manuscripts. We can also connect the network perspective to text analysis of the letter contents, showing which topics were discussed disproportionately often at a given point in time, and what sub-networks of individuals were discussing them. More generally our aim is to show how historical scholarship can benefit from large-scale network analysis, text-mining, and related quantitative methodologies.

Posters

Chair: James Algie

13:00-13:30 Poster slam

Poster presenters will give a lightning talk during the plenary session to advertise their poster. Presenters will be given exactly one minute to let the audience know what their poster is about.

13.30-14.30 Poster exhibition

Teaching of Creativity in the English Language Classroom, Abie Chan

Working with Data from Real-World Corpora: A Case Study on Identifying Issues and Using Scalable Solutions, Itamar Shatz

‘Under the shadow of swords: The Path to Jihad’ - A Corpus-Based Critical Analysis of Religious Metaphors in Jihadist Magazines, Katie Patterson

Do you see the -ing in SMOYING? Reading proficiency might influence the way we process unfamiliar words, Julia Schwarz

Collecting the Teacher-Student Chatroom Corpus, Andrew Caines

Word Prosody in Khorchin Mongolian, Chenming Gao

Syntactic L1-Attrition and Re-Exposure, Alexander Cairncross

The causal role of language-specific brain regions in contextual updating of ambiguous word meanings, Lucy MacGregor

Challenges to Speech Perception Impair Phonological Short-Term Memory, Harriet Smith

Syntactic Ambiguity: Meter, Rhyme and Lineation Effects, Andromachi Tsoukala

The Development of a Syntactic Awareness Task using Word-Order Correction Paradigm, Claudia Pik-Ki Chu

Vowel perception & production integration in Spanish/English bilinguals: an experimental study, Madeleine Rees

Duration as a focus-making device in Cantonese, Kechun Li

Explaining the Mathematical Word Problem Performance of Multilingual Children in Hyderabad, India, Jodie Webber

Ontology for a Common Sense Knowledge Graph, Guy Aglionby

Interfacing sound, meaning and constraint: Neural infrastructure for incremental interpretation, Yuxing Fang

Tracing the motivational dynamics of L3 learners: a multiple case study of four high- and low-proficiency undergraduates in the UK, Lixinhao Gao

Talk about mind and space: paternal and maternal contributions to school readiness, Elian Fink

Posters will be available to view on Cambridge Open Engage from 10 November.

We would like to thank James Algie (ja600@cam.ac.uk) and Yuchen Zong (yz538@cam.ac.uk) for organising the poster session this year.

Session 2

15:30-16:15 Social Signalling and Social Change: Inclusive Writing in French

Heather Burnett, Laboratoire de Linguistique Formelle, CNRS and Université de Paris

Chair: Dr Laura Wright

Gender inclusive writing ("écriture inclusive" EI) has long been the topic of public debates in France. Examples of EI for the word "students" are shown in (1).

(1) a. étudiant·e·s (point médian)
b. étudiant.e.s (period)
c. étudiants et étudiantes (repetition)
d. étudiant(e)s (parentheses)
e. étudiant-e-s (dash)
f. étudiantEs (capital)
g. étudiant/e/s (slash)
h. étudiant--e--s (double dash)

These debates have amplified since the Macron government prohibited the use of the point médian (1a) in official documents in 2017 (Abbou et al. 2018). In addition to being a point of disagreement between feminists and anti-feminists, EI is also controversial among feminists: it has many variants (1), who often disagree on which variant should be used (Abbou 2017).

In this talk, I argue that the source of many of these disagreements lies in the fact that French écriture inclusive has developed into a rich social signalling system: based on a quantitative study of EI in Parisian university brochures (joint work with Céline Pozniak (Burnett & Pozniak 2020)), I argue that writers use or avoid EI in part in order to communicate aspects of their political orientations. We show that these aspects involve writers' perspectives on gender, but also stances on issues unrelated to gender, such as (anti)institutionalism and support for the Macron government. I then outline a research programme for studying this signalling system from a formal perspective: following Burnett (2019), I show how we can use game-theoretic pragmatics to analyze EI's contribution to writers' political identity construction and the consequences that this has for its use as a tool for promoting gender equality and social change.

References

Abbou, J., Arnold, A., Candea, M., & Marignier, N. (2018). Qui a peur de l’écriture inclusive? Entre délire eschatologique et peur d’émasculation Entretien. Semen. Revue de sémio-linguistique des textes et discours, (44).
Abbou, J. (2017). (Typo) graphies anarchistes. Où le genre révèle l’espace politique de la langue. Mots. Les langages du politique, (1), 53-72.
Burnett, H. & C. Pozniak. (2020). Political Dimensions of Écriture Inclusive in Parisian Universities. Manuscript, Université de Paris.
Burnett, H. (2019). Signalling Games, Sociolinguistic Variation and the Construction of Style. Linguistics and Philosophy, 42(5), 419-450.

16:15-17:30 Cognitive and computational building blocks for more human-like language in machines

Josh Tenenbaum, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology (MIT)

Chair: Dr Andrew Caines

Respondent: Dr Guy Emerson

Humans learn language building on more basic conceptual and computational resources that we can already see precursors of in infancy. These include capacities for causal reasoning, symbolic rule formation, rapid abstraction, and commonsense representations of events in terms of objects, agents and their interactions. I will talk about steps towards capturing these abilities in engineering terms, using tools from hierarchical Bayesian models, probabilistic programs, program induction, and neuro-symbolic architectures. I will show examples of how these tools have been applied in both cognitive science and AI contexts, and point to ways they might be useful in building more human-like language, learning and reasoning in machines.

17:30-17:45 Closing remarks

Prof. A nn Copestake