About Language & AI: an interview with Guy Emerson

Submitted by Jane Durkin on Tue, 27/07/2021 - 12:49

Artificial Intelligence (AI) is an increasingly central aspect of language science research encompassing many areas from digital humanities and corpus linguistics, NLP applications like speech recognition and chat bots, to the use of machine learning to model human cognition.

Cambridge University is a world-leading centre for language and AI research. In this series of interviews, we talk to researchers from across Cambridge about their work in this field.

Guy Emerson is an academic fellow in the Department of Computer Science and Technology and Executive Director of Cambridge Language Sciences.

His research is primarily focused on computational semantics, and he has developed novel machine learning techniques to combine logical and probabilistic approaches to meaning in a linguistically motivated way.

In his role as Executive Director he leads in the development of new large-scale interdisciplinary proposals on behalf of Cambridge Language Sciences, as well as seeking philanthropic funding to extend the use of machine learning by humanities researchers working with language.

Guy also acts as a first point of contact for Cambridge researchers across the language sciences, who are interested in applying machine learning in their own work, but don’t know where to start.

If you would like to contact Guy to discuss how you might apply machine learning in your research, the best way to get in touch is via email at gete2@cam.ac.uk.

What is AI?

That's a good question because Artificial Intelligence (AI) is a term people define in different ways. I think we can use it to describe any computational system which can do something we might call intelligence.

I think sometimes people hear the term ‘AI’ and think of sci-fi AI, like robots taking over the world. I would say we’re very far away from that.

The majority of modern AI systems use some kind of machine learning. That means having an algorithm that can improve itself in some way using data. But that doesn't mean to the point of becoming sentient and taking over the world!

Why language and AI?

Ultimately AI gives you a set of tools which you can apply to different problems in different ways.

There are a lot of areas of language sciences where AI is helpful.

Digital humanities

One type of application is data science, where you have a lot of data, a large corpus for example, and you're using computational tools to better understand that data. This is particularly relevant for the digital humanities and corpus linguistics.

Having computational systems that can automate processes means corpus linguists can do things more easily and on a larger scale. With AI techniques and digitized corpora it's also possible to look at language change in real time.

On Twitter, for example, you can see how people are using language right now. It wouldn't be feasible to do this by hand, let alone if we didn't have sources like Twitter to get the data.

Natural Language Processing (NLP)

A second type of application is where you want to build a system that can perform a task for you. This is usually called Natural Language Processing (NLP).

We’re seeing this more and more in day-to-day life, including machine translation and speech recognition systems like Google Translate and Siri, and dialogue systems like a bot that you can talk to and ask questions.

You could also use NLP for document summarisation, or with a database with numerical or other input data that you want to summarise in words.

There are so many ways that you can apply AI techniques to solve useful tasks in terms of understanding or producing natural language.

AI for language and cognition

The third type of application is trying to understand how humans use language – how cognition, language processing and language production work – and to produce computational models of that.

When you have a complex system like human language, it's very difficult to work out all the implications of a theory on pen and paper. So having the computational tools to manage that is really helpful for fleshing out and testing theories.

Tell me about your research

The majority of my research is in this third area: how people learn language and how to model human language computationally.

I work mostly on semantics, how we represent the meaning of language and how meaning is learnt.

Humans can learn language in many ways. You can learn language from hearing it in context, the way a child learns language. As you get older, you can also learn language from other language.

Cambridge Language Sciences Co-Director Ann Copestake liked the word ‘scrumpy’ as an example, because it’s obscure but not too much so. If you were served a glass of scrumpy and were told “this is scrumpy”, you could see it and smell it and taste it to learn the meaning. On the other hand, you might read in a book “they ordered a scrumpy at the pub”, and you can figure out the meaning from that context even if you no-one tells you what the word means. A third way to learn a word is if someone tells you explicitly “scrumpy is a type of strong cider”.

Humans do all these things subconsciously. The challenge is to develop computational models that can describe how people do this. The aim is to show how language might be stored in the brain and how it can be learnt during our lives.

What is the potential impact of this kind of research?

If we can better understand human cognition, this could have impact both on machine learning and also on human-oriented applications such as in healthcare and education.

On the machine learning side, it helps us understand the strengths and weaknesses of different computational techniques. If a computational model doesn't match what humans do or there’s some capability that humans have which the machine learning models are struggling to replicate, then we know there's a limitation in what the computational models are doing.

SEE ALSO: "The Meaning of “Most” for Visual Question Answering Models" by Alexander Kuhnle and Ann Copestake for a recent example of research which illustrates how we can compare and contrast artificial 'cognition' with human cognition.

On the more human side if we understand human cognition better, this could have applications in terms of how we can learn better, how we can have a better education system that is sensitive to how people learn, and potentially the diversity in how people learn.

There are also applications in terms of mental health. If we understand cognitive processes, we can understand when those cognitive processes go wrong and how we might potentially better manage them.

What does the future hold?

For most of my work I've been working with English corpora, in particular Wikipedia as a corpus.

One direction my research is taking now is trying to learn not just from corpora but also from other sources of data. In the way that humans can learn from hearing a word in a real-world context or in a linguistic context or being given a definition. You have all these different ways of learning language. Can we develop machine-learning systems that can also do that and incorporate those different types of data?

Looking at the field more generally, there’s also a trend of trying to grapple with some of the ethical questions around NLP technology. It’s important because AI is becoming more and more embedded in our lives. When you train a machine learning system, if you have data which is sexist or racist or exhibits any other kind of prejudice, then the resulting system could have those same issues. We need to make sure we don't apply the technology in a way that's going to exacerbate existing prejudices or existing inequality.

How can we create more opportunities for interdisciplinary collaboration?

There are huge opportunities.

I think there needs to be a dialogue between humanities researchers who have these research questions and datasets, and machine-learning specialists who can then work with them to find good solutions that work for that particular application. The possibilities are really wide.

One problem I've noticed through the time I've been with Cambridge Language Sciences is that people from different disciplines often talk about things in very different ways. It’s difficult to have that interdisciplinary exchange, if you're talking about things in completely different ways.

So finding the right shared terminology and finding the common ground is really important.

If you haven’t worked with AI before, it can be hard to know what is feasible with current technology. But even AI researchers can be surprised when things turn out to be easier or harder than expected. So it can be a great idea to first run a pilot study, and it's wonderful that the Language Sciences Incubator Fund is there to support that.