skip to primary navigationskip to content

Language sciences poster session and lunch

When Nov 07, 2013
from 01:15 PM to 02:15 PM
Where Syndicate Room, Old Schools
Contact Name
Contact Phone 01223 767397
Add event to calendar vCal


The impact of expanding foreign language A Level pupils' awareness and use of metacognitive strategies on confidence and proficiency in speaking

Karen Forbes (Faculty of Education)

The question of how to improve foreign language speaking skills of pupils in British schools is of paramount importance to language teachers and policy makers today. Here the findings are reported from a small-scale action research study which was carried out with a class of Year 12 pupils of French, aged 16-17, in a secondary school in Cambridgeshire. While all pupils generally achieved well in the reading, writing and listening aspects of the course, this was not the case with speaking the language. The primary aim of this study was therefore to introduce the students to a range of metacognitive learning strategies with a view to improving their confidence and proficiency in speaking skills. Data was collected from questionnaires, interviews, strategy checklists and assessment marks collected both before and after a six-week period of strategy instruction. The findings indicate that the use of learning strategies seems to have had a positive impact on pupils’ confidence and proficiency in speaking and after the intervention the participants reported an increase in how much they both valued and used a range of metacognitive strategies. 

Download the pdf 


Distinguishing formulaic from productive language use in big learner data: a case study of relative clauses in EFCAMDAT

J. Geertzen, T. Alexopoulou, A. Korhonen, D Muerers (Dept. of Theoretical and Applied Linguistics, Computer Laboratory)

Recent decades have seen growth of international online teaching schools of English, making English language and teaching accessible to learners around the globe. These teaching institutions collect large amounts of learner data as part of their routine teaching operations. For instance, Englishtown, the online school of EF Education First, is accessed by around 700,000 learners worldwide. The potential for compiling large scale data resources for research in Second Language Acquisition (SLA) and education is immense. An example of such a resource is EFCAMDAT (EF–Cambridge Open Language Database), a database of writings by Englishtown students, recently constructed at the Dept of Theoretical and Applied Linguistics (Geertzen et al. 2013).

EFCAMDAT stands out for its size, the diversity of student backgrounds and topics covered and the longitudinal data provided for individual learners. It currently consists of half a million scripts from 85K students, summing 33 million words. Learner proficiency levels are aligned to the Common European Framework of Reference Levels (CEFR), a crucial feature for educators and assessors. EFCAMDAT will continue to grow, making it a living resource and by far the largest learner database publicly available.

The magnitude and richness of resources like EFCAMDAT presents SLA researchers with important methodological challenges (Granger et al. 2007; Meurers 2009). Unlike most current learner corpora, EFCAMDAT is not the result of a carefully designed SLA data collection. It is the by-product of the teaching activity of a big online teaching institution, containing noisy and often unpredictable information. For instance, certain writing tasks may elicit specific language leading to overrepresentation of structures. Students may be ‘lifting’ language from their input or routinely producing formulaic language (Myles 1995). It is not possible to know in advance which pieces of language are productive, formulaic, under- or over-represented. However, reliable generalisations about L2 development can only be based on ‘clean’ and consistent data. Data consistency is then an important methodological challenge for big learner data. Natural language processing technology that can address this problem is vital for exploiting the full magnitude of resources like EFCAMDAT

In this poster we show how a measure combining frequency of use, sharedness by learners and mutual information can distinguish productive language from language that employs formulas or contains overrepresented structures due to input/task effects. We use the acquisition of relative clauses as a study case to demonstrate the nature of the problem and our solution.


Geertzen, J., T. Alexopoulou, and A. Korhonen (2013). Automatic linguistic annotation of large scale l2 databases: The ef-cambridge open language database (efcamdat). In Selected Proceedings of the 2012 Second Language Research Forum, Somerville, MA, USA. Cascadilla Proceedings Project.

Granger, S., O. Kraif, C. Ponton, G. Antoniadis, and V. Zampa (2007). Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCaLL 19(3), 252–268.

Meurers, D. (2009). On the automatic analysis of learner language. CALICO Journal 26 (3), 469–473.

Myles, F. (1995). Interaction between linguistic theory and language processing in SLA. Second Language Research 11(3), 235–266. 

Download the pdf


Automating L2 acquisition research: an interdisciplinary perspective

Helen Yannakoudakis (Cambridge English), Ted Briscoe (Computer Lab), Dora Alexopoulou (Dept. of Theoretical and Applied Linguistics)

L2 acquisition research attempts to explain how the learner acquires a second language, and make inferences about the internal representations of L2 knowledge in the mind of the learner at different stages of learning.

Common methodologies in L2 acquisition research involve theory-driven approaches for formulating hypotheses on learner grammars, typically based on linguistic intuition and the extant literature on learner English. The advantage is that they allow us to identify learner-language properties that are well understood and can inform learning theory. However, they may also emphasise self-evident hypotheses and overlook properties about learner grammars that may not have been discussed in the linguistic literature.

In this work, we propose a new methodological technique to L2 research and use data-driven methods in tandem with visualisation techniques to shed light on understanding the linguistic abilities that characterise different levels of attainment and, more generally, developmental aspects of learner grammars. Data-driven techniques allow us to partially automate hypothesis formation, as well as explore a much larger hypothesis space, while visualisation can support effective, fast and intuitive exploration of large spaces.

More specifically, we use discriminative machine learning methods to automatically identify linguistic features that are predictive of a learner's level of attainment, and then apply coordinated graph visualisation techniques on discriminative features and build a tool for hypothesis generation. Experimental results demonstrate that our tool can support Second Language Acquisition (SLA) research and aid the development of hypothesis about learner grammars.

Download the pdf


Investigating acquisition of prosodic focus marking and native perception of learners' English intonation

Maria Kunevic, Toby Hudson, Brechtje Post, Dora Alexopoulou (Dept. of Theoretical and Applied Linguistics)

In English, intonation is one of the main means to convey the information structure of an utterance and to mark constituents that are important, i.e. focused. In our study we are investigating how prosodic prominence is realised by learners of English and how native English speakers perceive the attempted prosodic focus marking.

Our approach was to construct a purpose-­built data set in which the prosodic focus environment was carefully controlled. Our corpus currently consists of semi-­ spontaneous utterances produced by 36 Russian learners of English at three levels of proficiency (corresponding to A2, B1, B2 levels of the Common European Framework of Reference) and 12 native speakers of English. We are interested in which pitch, durational, intensity, spectral (and segmental) features are salient for a) unambiguous communication of prominence, and b) sounding ‘authentic’, i.e. sufficiently similar to native productions, or considered ‘good enough’ by the native listener. To address this and to provide a listener-­oriented perspective, a subset of the utterances was used as stimuli for the native perception online experiment.

Preliminary results show that learners differ in their realisation of narrow and broad focus, and though they often make an appropriate choice for the placement of pitch accent in different focus conditions, its phonetic realisation is inappropriate. We also found that the degree of deaccenting is consistent with the proficiency level: the higher the proficiency, the greater is the difference between the F0 peak values of first and final word in narrow focus condition. This result correlates with the perception of prominence by native speakers, as they were most consistent in identifying the prominent word in a subject narrow focus context.

Our next step is to use these data as a baseline for developing annotation techniques for L2 English speech corpus by comparing hand-­labelled and automatically generated prosodic annotation together with native speakers’ judgements on prominent words. To this end, all sound files will be passed through automatic prosodic prominence detection algorithms (designed for use on native speech) so we can see to what extent this output matches human judgements and how acoustic parameters contribute to the percept of prominence in learner speech. 


Similarity and contrast in L1 pronunciation attrition in bilinguals

Brechtje Post and Samantha Jones (Dept. of Theoretical and Applied Linguistics)

Research on L1 pronunciation attrition has shown that second language learning has bidirectional effects, with bilingual productions in the L1 as well as the L2 usually falling somewhere between those of monolingual speakers of the languages in question (e.g. Flege 1987, Major 1992, Jiang 2008, Chang 2012, Mennen 2004). Flege’s Speech Learning Model (1995) provides a convincing account of bidirectional effects, since the phonetic systems of the two languages are predicted to interact in such a way that a single merged category will be used in the production of sounds which are “similar” in both languages, in the L1 as well as the L2. However, such a merger of similar sounds in the two languages could potentially compromise the maintenance of phonemic contrasts in the L1. For instance, vowels can be nasalised in the context of nasal consonants in both French and English (e.g. pin), but it can only be used to contrast lexical items in French (pin ‘pine’ vs. paix ‘peace’). If late bilinguals French-English speakers showed merger by producing similar amounts of nasalisation in nasal contexts in their L1 and L2, the French nasal-oral contrast could be compromised.

Using nasalisation, we investigated to what extent contrastivity might interact with similarity in the L1 pronunciation of late bilinguals. We found that the amount of nasalisation produced by bilinguals falls between that produced by monolingual controls in both of their languages, as predicted (L1 French and L2 English, here). However, we also found that this effect of accommodation towards the L2 was significantly smaller for contrastive distinctions (pin ‘pine’ vs. paix ‘peace’) than non-contrastive phones (/u/, here).

Our findings confirm, in line with earlier studies, that the phonetic and phonological changes in the L1 of late bilinguals are sensitive to properties of both the L1 and the L2 in a way which suggests that L1 and L2 phones are (at least in part) linked at a system-wide level, determining production in both languages (cf. Chang 2012). However, the findings also show that, although similar phones can indeed be merged in the bilingual speaker’s system - as predicted by the Speech Learning Model - mergers between similar phones are blocked when they threaten to undermine contrastivity in the native system. This suggests that the interactions between the L1 and L2 in the bilingual developmental process are truly systemic in nature, affecting comparable elements throughout the shared bilingual system in a similar way, while both similarity and contrastiveness between the elements in the L1 and L2 constrain convergence and divergence between the two.


Exploring academic literacy: premliminary insights from the Cambridge Corpus of Academic English

Fiona Barker (Cambridge English Language Assessment) and Laura Grimes (Cambridge University Press)

The English language is increasingly present in academic settings. Cambridge English Language Assessment and Cambridge University Press are undertaking a research project to develop a clearer understanding of the English language skills needed by students at English-medium universities; this will facilitate the development of better teaching materials and assessment tools.

At the heart of the project is the creation of a corpus of academic writing (CAMCAE) collected from a range of academic levels, from sixth form to established university researchers. The corpus will also cover a range of disciplines and first language backgrounds, from essays and reports to research theses and journal articles. Though there are several existing academic corpora, e.g. BAWE (see Gardner and Nesi 2003) and MICUSP (see Ädel & Römer 2009) none contain the breadth of subjects and levels that are being collected as part of CAMCAE. The analysis of this data will enable us to identify specific features of academic English across these parameters.

This poster presents findings from our preliminary analysis of the texts collected so far, using both quantitative and qualitative methods to gain maximum insight into how English is used in academic settings.

We have recently begun a new phase of data collection, facilitated by the creation of an online data portal where participants can contribute their written work to the corpus quickly and easily. This will enable us to collect data from a wider range of institutions globally, allowing us to build a larger, more balanced corpus in order to adequately represent the range of language found in academic contexts.

Download the pdf


The effect of topic on documents in the Cambridge Learner Corpus

Andrew Caines (Dept. of Theoretical and Applied Linguistics/ALTA), Paula Buttery (Dept. of Theoretical and Applied Linguistics/Computer Lab/ALTA)

In standard experimental procedure, observation of dependent variable effects on the independent variable(s) is made possible through careful control of extraneous factors. However, most large corpora have not been constructed with specific experimental purpose in mind, but have instead been collated for general purpose resource creation and/or academic research. This is especially true of learner corpora.
One such uncontrolled factor is document topic. We identified four topic labels in the Cambridge Learner Corpus (CLC) - commerce, narrative, personal, society - and set out to investigate how certain lexico-syntactic features are affected by topic in a subset of the CLC. We focus on lexis and verb subcategorization frames, training naive Bayes classifiers on these features. We report the classifier accuracies in labelling unseen documents and argue that topic should be controlled for in learner corpus research.


Website re-development 

This website will no longer be updated after 31 July 2017 while a new version of the site is in development. This is to reflect the transition of Cambridge Language Sciences to the status of Interdisciplinary Research Centre.


Site members will be notified via the mailing list when the new website is live. 


Upcoming events

2017 UK Speech Conference

Sep 11, 2017

Dept. of Engineering, Trumpington Street, Cambridge CB2 1PZ

Effectively translating neuroscience for teaching practice: opportunities and next steps

Oct 13, 2017

Kaetsu Conference Centre, Murray Edwards College, Cambridge

15th Corpus Linguistics in the South conference

Oct 28, 2017

Faculty of Education, University of Cambridge

Cambridge Language Sciences Annual Symposium 2017

Nov 21, 2017

Cripps Court, Magdalene College

Previous events

Upcoming events