skip to content

Cambridge Language Sciences

Interdisciplinary Research Centre
 
Corpus concordance_590x288px

Are you a researcher interested in accessing English language corpora at Cambridge University Press?

The Cambridge English Corpus is the Press' multi-billion word collection of texts and is available to University of Cambridge researchers for academic research use.

It comprises several smaller corpora including:

  •  Cambridge Learner Corpus (developed in partnership with Cambridge Assessment English) – a 50 million word collection of learner English from Cambridge English exam scripts.
  • Cambridge Reference Corpus – a multi-billion word collection of written and spoken ‘expert speaker’ English.
  • Cambridge Academic Corpus – 400 million words of written and spoken academic language at undergraduate and post-graduate level from a range of US and UK institutions, including lectures, seminars, student presentations, journals, essays and text books.
  • Cambridge Spoken Corpus – 75 million words of transcribed spoken data including everyday conversations, telephone calls, radio broadcasts, presentations, speeches, meetings, TV programmes and lectures. This includes the BNC Spoken 2014 (developed in partnership with Lancaster University), which is available for research use here.

 

APPLY FOR ACCESS

What we do

Cambridge Language Sciences is an Interdisciplinary Research Centre at the University of Cambridge. Our virtual network connects researchers from five schools across the university as well as other world-leading research institutions. Our aim is to strengthen research collaborations and knowledge transfer across disciplines in order to address large-scale multi-disciplinary research challenges relating to language research.

JOIN OUR NETWORK

JOIN OUR MAILING LIST

CONTACT US