skip to content

Cambridge Language Sciences

Interdisciplinary Research Centre
 
Read more at: David Strohmaier

David Strohmaier

I am David Strohmaier, a computer scientist and philosopher. Currently, I work as a research associate in the Natural Language and Information Processing (NLIP) group at the University of Cambridge.

My research applies machine/deep learning to lexical semantic acquisition. How do neural models learn the meaning of words and to what extent does that reflect our own learning processes?


Read more at: Paul Siewert

Paul Siewert

[pʰaʊ̯l ˈziː.vɛɐ̯tʰ]

I am a PhD student at Jesus College. My training is mathematical, but the main work I do with my supervisor Fermín Moscoso del Prado Martín is in mathematical linguistics. Please see my page on the Computer lab website for my non-linguistic interests.

"What counts is not what you cover, but what you uncover!"


Read more at: Suchir Salhan

Suchir Salhan

I am a PhD student at the University of Cambridge (Gonville & Caius College). I previously completed my BA and MEng in Computer Science & Linguistics at Gonville & Caius College, obtaining a “starred First” and a Distinction respectively. I specialise in Machine Learning and Natural Language Processing, exploring alternatives to Transformer-based Large Language Models (LLMs). My academic work spans Machine Learning and Cognitive Science, with a focus on Explainable and Interpretable Machine Learning, and fundamental questions about the human capacity for natural language. 


Read more at: Fermin Moscoso del Prado Martin

Fermin Moscoso del Prado Martin


Read more at: Mila Marcheva

Mila Marcheva

Bilingualism; (First) language acquisition; Psycholinguistics; Computational modeling of linguistic phenomena


Read more at: Dr Luca Benedetto

Dr Luca Benedetto

Automated evaluation of content for language learners; Modelling of language learners; Natural Language Understanding


Read more at: Georgi Karadzhov

Georgi Karadzhov

Natural language processing, dialogue systems 

Theses / dissertations

2024 (No publication date)

  • Karadzhov, G., 2024 (No publication date). DEliBots : Deliberation Enhancing Bots
    Doi: http://doi.org/10.17863/CAM.109182
  • Journal articles

    2023

  • Karadzhov, G., Stafford, T. and Vlachos, A., 2023. DeliData: A Dataset for Deliberation in Multi-party Problem Solving Proceedings of the ACM on Human-Computer Interaction, v. 7
    Doi: http://doi.org/10.1145/3610056

  • Read more at: Christopher Davis

    Christopher Davis

    Computational modelling of first/second language acquisition,
    machine learning,
    multimodal semantics

    Theses / dissertations

    2024 (No publication date)

  • Davis, C., 2024 (No publication date). On the evaluation and application of neural language models for grammatical error detection
    Doi: http://doi.org/10.17863/CAM.108291
  • Conference proceedings

    2023

  • Caines, A., Benedetto, L., Taslimipoor, S., Davis, C., Gao, Y., Andersen, Ø., Yuan, Z., Elliott, M., Moore, R., Bryant, C., Rei, M., Yannakoudakis, H., Mullooly, A., Nicholls, D. and Buttery, P., 2023. On the application of Large Language Models for language teaching and assessment technology CEUR Workshop Proceedings, v. 3487
  • Diehl Martinez, R., Goriely, Z., McGovern, H., Davis, C., Caines, A., Buttery, P. and Beinborn, L., 2023. CLIMB – Curriculum Learning for Infant-inspired Model Building CoNLL 2023 - BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, Proceedings,
  • 2022

  • Davis, C., Bryant, C., Caines, A., Rei, M. and Buttery, P., 2022. Probing for targeted syntactic knowledge through grammatical error detection CoNLL 2022 - 26th Conference on Computational Natural Language Learning, Proceedings of the Conference,
  • 2021

  • Yuan, Z., Taslimipoor, S., Davis, C. and Bryant, C., 2021. Multi-Class Grammatical Error Detection for Correction: A Tale of Two Systems EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings,
    Doi: 10.18653/v1/2021.emnlp-main.687
  • 2019

  • Zaidi, AH., Caines, A., Davis, C., Moore, R., Buttery, P. and Rice, A., 2019. Accurate modelling of language learning tasks and students using representations of grammatical proficiency EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining,

  • Read more at: Chris Bryant

    Chris Bryant

    Grammatical error detection and correction, CALL, NLP

    Theses / dissertations

    2019

  • Bryant, CJ., 2019. Automatic annotation of error types for grammatical error correction
    Doi: http://doi.org/10.17863/CAM.40832
  • Conference proceedings

    2017

  • Bryant, CJ., Felice, M. and Briscoe, E., 2017. Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, v. 1
  • Journal articles

    2016

  • Felice, M., Bryant, C. and Briscoe, T., 2016. Automatic extraction of learner errors in ESL sentences using linguistically enhanced alignments COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers,

  • Read more at: Dr Simon Baker

    Dr Simon Baker

    Natural language processing; biomedical text; lexical acquisition

    Journal articles

    2023 (Accepted for publication)

  • Collins, C., Baker, S., Brown, J., Zeng, H., Chan, A., Stenius, U., Narita, M. and Korhonen, A., 2023 (Accepted for publication). Text Mining for Contexts and Relationships in Cancer Genomics Literature Bioinformatics,
    Doi: http://doi.org/10.1093/bioinformatics/btae021
  • 2021

  • Ali, I., Dreij, K., Baker, S., Högberg, J., Korhonen, A. and Stenius, U., 2021. Application of Text Mining in Risk Assessment of Chemical Mixtures: A Case Study of Polycyclic Aromatic Hydrocarbons (PAHs). Environ Health Perspect, v. 129
    Doi: http://doi.org/10.1289/EHP6702
  • Su, Y., Wang, Y., Cai, D., Baker, S., Korhonen, A. and Collier, N., 2021. PROTOTYPE-TO-STYLE: Dialogue Generation with Style-Aware Editing on Retrieval Memory IEEE/ACM Transactions on Audio Speech and Language Processing, v. 29
    Doi: http://doi.org/10.1109/TASLP.2021.3087948
  • Majewska, O., Collins, C., Baker, S., Björne, J., Brown, SW., Korhonen, A. and Palmer, M., 2021. BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine. J Biomed Semantics, v. 12
    Doi: http://doi.org/10.1186/s13326-021-00247-z
  • 2020 (Accepted for publication)

  • Vulic, I., Baker, S., Ponti, E., Petti, U., Leviant, I., Wing, K., Majewska, O., Bar, E., Malone, M., Poibeau, T., Reichart, R. and Korhonen, A., 2020 (Accepted for publication). Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity Computational Linguistics,
    Doi: http://doi.org/10.1162/coli_a_00391
  • 2020

  • Wichmann, P., Brintrup, A., Baker, S., Woodall, P. and McFarlane, D., 2020. Extracting supply chain maps from news articles using deep neural networks International Journal of Production Research, v. 58
    Doi: http://doi.org/10.1080/00207543.2020.1720925
  • Crichton, G., Baker, S., Guo, Y. and Korhonen, A., 2020. Neural networks for open and closed Literature-based Discovery. PLoS One, v. 15
    Doi: http://doi.org/10.1371/journal.pone.0232891
  • Petti, U., Baker, S. and Korhonen, A., 2020. A systematic literature review of automatic Alzheimer's disease detection from speech and language. J Am Med Inform Assoc, v. 27
    Doi: http://doi.org/10.1093/jamia/ocaa174
  • Chiu, B. and Baker, S., 2020. Word embeddings for biomedical natural language processing: A survey Language and Linguistics Compass, v. 14
    Doi: http://doi.org/10.1111/lnc3.12402
  • 2019

  • Pyysalo, S., Baker, S., Ali, I., Haselwimmer, S., Shah, T., Young, A., Guo, Y., Högberg, J., Stenius, U., Narita, M. and Korhonen, A., 2019. LION LBD: a literature-based discovery system for cancer biology. Bioinformatics, v. 35
    Doi: http://doi.org/10.1093/bioinformatics/bty845
  • 2018

  • Wichmann, P., Brintrup, A., Baker, S., Woodall, P. and McFarlane, D., 2018. Towards automatically generating supply chain maps from natural language text
    Doi: http://doi.org/10.1016/j.ifacol.2018.08.207
  • 2017 (Accepted for publication)

  • Baker, S., Ali, I., Silins, I., Pyysalo, S., Guo, Y., Högberg, J., Stenius, U. and Korhonen, A., 2017 (Accepted for publication). Cancer Hallmarks Analytics Tool (CHAT): A text mining approach to organise and evaluate scientific literature on cancer Bioinformatics, v. 33
    Doi: http://doi.org/10.1093/bioinformatics/btx454
  • 2017

  • Larsson, K., Baker, S., Silins, I., Guo, Y., Stenius, U., Korhonen, A. and Berglund, M., 2017. Text mining for improved exposure assessment PLOS One, v. 12
    Doi: http://doi.org/10.1371/journal.pone.0173132
  • 2016

  • Baker, S., Silins, I., Guo, Y., Ali, I., Högberg, J., Stenius, U. and Korhonen, A., 2016. Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics, v. 32
    Doi: http://doi.org/10.1093/bioinformatics/btv585
  • 2015

  • Korhonen, A., Baker, S., Silins, I., Guo, Y., Ali, I., Hogberg, J. and Stenius, U., 2015. Automatic Semantic Classification of Scientific Literature According to the Hallmarks of Cancer Bioinformatics,
  • Conference proceedings

    2021

  • Su, Y., Cai, D., Wang, Y., Vandyke, D., Baker, S., Li, P. and Collier, N., 2021. Non-Autoregressive Text Generation with Pre-trained Language Models Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics,
  • 2019

  • Chiu, B., Baker, S., Palmer, M. and Korhonen, A., 2019. Enhancing biomedical word embeddings by retrofitting to verb clusters BioNLP 2019 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 18th BioNLP Workshop and Shared Task,
  • 2018

  • Stathopoulos, YA., Baker, S., Rei, M. and Teufel, S., 2018. Variable typing: Assigning meaning to variables in mathematical text NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, v. 1
  • Mendes, E., Rodriguez, P., Freitas, V., Baker, S. and Atoui, MA., 2018. Towards improving decision making and estimating the value of decisions in value-based software engineering: the VALUE framework Software Quality Journal, v. 26
    Doi: http://doi.org/10.1007/s11219-017-9360-z
  • 2017 (No publication date)

  • Baker, S., Korhonen, A. and Pyysalo, S., 2017 (No publication date). Cancer Hallmark Text Classification Using Convolutional Neural Networks
    Doi: http://doi.org/10.17863/CAM.12420
  • 2017

  • Baker, S. and Korhonen, A., 2017. Initializing neural networks for hierarchical multi-label text classification BioNLP 2017 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 16th BioNLP Workshop,
    Doi: 10.18653/v1/w17-2339
  • 2016

  • Baker, S., Kiela, D. and Korhonen, A., 2016. Robust text classification for sparsely labelled data using multi-level embeddings COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers,
  • 2015

  • Korhonen, A., Guo, Y., Baker, S., Yetisgen-Yildiz, M., Stenius, U., Narita, M. and Liò, P., 2015. Improving literature-based discovery with advanced text mining Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 8623
    Doi: http://doi.org/10.1007/978-3-319-24462-4_8
  • 2014

  • Baker, S., Reichart, R. and Korhonen, A., 2014. An unsupervised model for instance level subcategorization acquisition EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference,
  • 2010

  • Baker, S. and Mendes, E., 2010. Aggregating Expert-Driven Causal Maps for Web Effort Estimation ADVANCES IN SOFTWARE ENGINEERING, v. 117
  • Baker, S. and Mendes, E., 2010. Evaluating the Weighted Sum Algorithm for Estimating Conditional Probabilities in Bayesian Networks 22ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING & KNOWLEDGE ENGINEERING (SEKE 2010),
  • 2008

  • Baker, S., Au, F., Dobbie, G. and Warren, I., 2008. Automated usability testing using HUI Analyzer ASWEC 2008: 19TH AUSTRALIAN SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS,
    Doi: http://doi.org/10.1109/ASWEC.2008.40
  • Baker, S., Au, F., Dobbie, G. and Warren, I., 2008. Automated usability testing using HUI analyzer Proceedings of the Australian Software Engineering Conference, ASWEC,
    Doi: http://doi.org/10.1109/ASWEC.2008.4483248

  • What we do

    Cambridge Language Sciences is an Interdisciplinary Research Centre at the University of Cambridge. Our virtual network connects researchers from five schools across the university as well as other world-leading research institutions. Our aim is to strengthen research collaborations and knowledge transfer across disciplines in order to address large-scale multi-disciplinary research challenges relating to language research.

    JOIN OUR NETWORK

    JOIN OUR MAILING LIST

    CONTACT US