Sawchen Lecture: David Bamman, “Building Russian BookNLP”


DATE
Tuesday December 8, 2020
TIME
9:30 AM - 10:30 AM

Join us on December 8 at 9:30 am for the virtual Sawchen Lecture Series, featuring David Bamman of the University of California Berkeley. The talk is co-sponsored by the SSHRC-funded Digital Dostoevsky project at the University of Toronto.

Register here via Zoom: https://ubc.zoom.us/webinar/register/WN_-s3KH3XHTbWQZsQu6wy1Og

Title: “Building Russian BookNLP”

Abstract: BookNLP (Bamman et al., 2014) is a natural language processing pipeline for reasoning about the linguistic structure of text in books, specifically designed for works of fiction. In addition to its pipeline of part-of-speech tagging, named entity recognition, and coreference resolution, BookNLP identifies the characters in a literary text, and represents them through the actions they participate in, the objects they possess, their attributes, and dialogue. The availability of this tool has driven much work in the computational humanities for English texts, especially surrounding character (Underwood et al., 2018; Kraicer and Piper, 2018; Dubnicek et al., 2018). In this talk, I’ll describe the development of BookNLP for Russian literature, focusing on two major components: creating an annotated dataset of named entities and their coreference for Russian fiction (which allows us to assess the performance of existing Russian NER and coreference systems for the domain of literature), and developing methods for identifying and representing characters in Russian texts (allowing for their analysis at scale).

Bio: David Bamman is an assistant professor in the School of Information at UC Berkeley, where he works on applying natural language processing and machine learning to empirical questions in the humanities and social sciences. His research often involves adding linguistic structure (e.g., syntax, semantics, coreference) to statistical models of text, and focuses on improving NLP for a variety of languages and domains (such as literary text and social media). Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University.