Loading Events

« All Events

  • This event has passed.

Thesis Defence: Character Sentences as Quantitative Metrics: Leveraging Large Language Models to Measure Literary Characterization

August 8 at 10:00 am - 2:00 pm

Qilin Liu, supervised by Dr. Kyong Yoon, will defend their thesis titled “Character Sentences as Quantitative Metrics: Leveraging Large Language Models to Measure Literary Characterization” in partial fulfillment of the requirements for the degree of Master of Arts in Interdisciplinary Graduate Studies – Digital Arts and Humanities theme.

An abstract for Qilin Liu’s thesis is included below.

Defences are open to all members of the campus community as well as the general public. Registration is not required for in-person defences.


Abstract

This thesis investigates literary characterization through computational literary studies by introducing a novel analytical unit termed “Character Sentences”—units explicitly providing descriptive or action-related information about literary characters. Three primary contributions are presented in this thesis: (1) a narratologically informed definition of character sentences; (2) two gold-standard datasets, HPCS (Harry Potter Character Sentence_clause-level & Harry Potter Character Sentence_full-sentence), meticulously annotated from the Harry Potter series, designed to benchmark automated character sentence extraction tasks; and (3) a natural language processing (NLP) pipeline integrating large language models (LLMs) to automatically identify character sentences and accurately attribute them to corresponding characters, suitable for texts of any length. The gold-standard datasets demonstrate high inter-annotator agreement, with Krippendorff’s α values exceeding 0.80 (αHPCS_clause-level = 0.81; αHPCS_full-sentence = 0.86). The proposed NLP pipeline comprises four modules: (1) a text cleaning module; (2) a sentence segmentation module aligned with the character sentence definition; (3) a zero-shot LLM processing module employing the LangGPT prompting framework and two-stage coreference resolution reasoning; and (4) a dependency parsing-based filter module enhancing the accuracy of character attribution. Empirical evaluations indicate the pipeline achieves a robust performance, yielding an F1 score of 94.51% in character sentence identification and an accuracy of 84.88% in character attribution on the HPCS_full-sentence dataset. This research is the first to explicitly define character sentences and develop an automated, theory-informed, sentence-level approach integrated with LLMs for character sentence extraction. It addresses critical gaps in computational literary studies and underscores the efficacy of LLMs and prompt engineering within literary analysis. The datasets and source code developed in this thesis are publicly accessible to facilitate further research and methodological advancements in the field.

Details

Date:
August 8
Time:
10:00 am - 2:00 pm

Venue

Additional Info

Room Number
UNC 334
Registration/RSVP Required
No
Event Type
Thesis Defence
Topic
Arts and Humanities, Research and Innovation, Science, Technology and Engineering
Audiences
Alumni, Community, Faculty, Staff, Families, Partners and Industry, Students, Postdoctoral Fellows and Research Associates