Theses

Jonathan Balloch. Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change. Ph.D. Dissertation, , 2024.

Real-world autonomous decision-making systems, from robots to recommendation engines, must operate in environments that change over time. While deep reinforcement learning (RL) has shown an impressive ability to learn optimal policies in stationary environments, most methods are data intensive and assume a world that does not change between training and test time. As a result, conventional RL methods struggle to adapt when conditions change. This poses a fundamental challenge: how can RL agents efficiently adapt their behavior when encountering novel environmental changes during deployment without catastrophically forgetting useful prior knowledge? This dissertation demonstrates that efficient online adaptation requires two key capabilities: (1) prioritized exploration and sampling strategies that help identify and learn from relevant experiences, and (2) selective preservation of prior knowledge through structured representations that can be updated without disruption to reusable components.

We first establish a formal framework for studying online test-time adaptation (OTTA) in RL by introducing the Novelty Minigrid (NovGrid) test environment and metrics to systematically assess adaptation performance and analyze how different adaptation solutions handle various types of environmental change. We then begin our discussion of solutions to OTTA problems by investigating the impacts of different exploration and sampling strategies on adaptation. Through a comprehensive evaluation of model-free exploration strategies, we show that methods emphasizing stochasticity and explicit diversity are most effective for adaptation across different novelty types. Building on these insights, we develop the Dual Objective Priority Sampling (DOPS) strategy. DOPS improves model-based RL adaptation by training policy and world models on different subsets of data, each prioritized according to the different learning objectives. By balancing the trade-off between distribution overlap and mismatched objectives, DOPS achieves more sample-efficient adaptation while maintaining stable performance.

To improve adaptation efficiency with knowledge preservation, we develop WorldCloner, a neurosymbolic approach that enables rapid world model updates while preserving useful prior knowledge through a symbolic rule-based representation. WorldCloner demonstrates how structured knowledge representation can dramatically improve adaptation efficiency compared to traditional neural approaches. Finally, we present Concept Bottleneck World Models (CBWMs), which extend these insights into an end-to-end differentiable architecture. By grounding learned representations in human-interpretable concepts, CBWMs enable selective preservation of unchanged knowledge during adaptation while maintaining competitive task performance. CBWMs provide a practical path toward interpretable and efficient adaptation in neural RL systems.

Together, these contributions advance both the theoretical understanding and practical capabilities of adaptive RL systems. By showing how careful exploration and structured knowledge preservation can enable efficient online adaptation, this work helps bridge the gap between current RL systems and the demands of real-world applications where change is constant and adaptation essential.

Upol Ehsan. Human-centered Explainable AI. Ph.D. Dissertation, , 2024.

If AI systems are going to inform consequential decisions such as deciding whether you should get a loan or receive an organ transplant, they must be explainable to everyone, not just software engineers. Despite commendable technical progress in “opening” the black-box of AI, the prevailing algorithm-centered Explainable AI (XAI) view overlooks a vital insight: who opens the black-box matters just as much as opening it. As a result of this blind spot, many popular XAI interventions have been ineffective and even harmful in real-world settings.

To address the blind spot, this dissertation introduces and operationalizes Human- centered XAI (HCXAI), a human-centered and sociotechnically-informed XAI paradigm. Focusing on non-AI experts, this dissertation demonstrates how Human-centered XAI: (A) expands the design space of XAI by broadening the domain of non-algorithmic factors that augment AI explainability and illustrating how to incorporate them. (B) enriches our knowledge of the importance of “who” the humans are in XAI design. (C) enables resourceful ways to do Responsible AI by providing proactive mitigation strategies through participatory methods.

It contributes 1) conceptually: new concepts such as such as Social Transparency that showcase how to encode socio-organizational context to augment explainability without changing the internal model; 2) methodologically: human-centered evaluation of XAI, actionable frameworks, and participatory methods to co-design XAI systems; 3) technically: computational techniques and design artifacts; 4) empirically: findings such as how one’s AI background impacts one’s interpretation of AI explanations, user perceptions of real AI users, and how AI explanations can negatively impact users despite our best intentions.

The impact of this dissertation spans research, practice, and policy. Beyond pioneering the HCXAI research domain, it has influenced society– informed AI policies at interna- tional organizations like the UN and being incorporated into NIST’s AI Risk Management Framework, a global standard for Responsible AI. The work been adopted by industry– seven Fortune 500 companies adopted its techniques, positively impacting over 3 million users by addressing AI trust calibration and resulting in savings of US $4.2 million. It has also nurtured a vibrant research community–over 400 researchers from 19+ countries have participated in four HCXAI workshops at ACM CHI (the leading venue for Human- Computer Interaction research) since 2021, culminating in the first ACM HCXAI journal issue, where I led the editorial efforts.

The dissertation transforms the XAI discourse from an algorithm-centered perspective to a human-centered one. It takes a foundational step towards creating a future where anyone, regardless of their background, can interact with AI systems in an explainable, accountable, and dignified manner so that people who are not at the table do not end up on the menu.

Ashutosh Baheti. Towards Fine-grained Multi-Attribute Control using Language Models. Ph.D. Dissertation, , 2024.

As we increasingly rely on powerful language models, ensuring their safe and effective operation necessitates extensive research in controllable text generation. Existing state-of-the-art language models struggle to generate the most accurate or desired output at the first attempt. Inspired by recent developments in self-correction in large language models and new reinforcement learning methods, we aim to train smaller language models as fine-grained editors, whereby they iteratively edit outputs to satisfy threshold constraints over multiple classifier-based attributes. In this thesis, I show a study of contextual offensive behavior of pretrained large language models and curate a high-quality dataset for toxicity detection. Next, I introduce a novel offline RL algorithm that can utilize arbitrary numeric scores as rewards during training to optimize any user-desired LM behavior by filtering out suboptimal data. Finally, I designed an offline RL framework, I propose a fine-grained multi-attribute controllability task, where the goal is to guide the language model to generate output sequences that satisfy user-defined threshold-based attribute constraints. The LM model can take multiple edits to reach the desired attributes. Experiments on both languages and proteins demonstrate the versatility and effectiveness of our approach.

Zhiyu Lin. Human-Aware Artificial Intelligence Procedural Content Generation. Ph.D. Dissertation, , 2024.

Although recent advancements in Machine Learning (ML)-based Artificial Intelligence (AI) generative models enabled a new generation of Computational Creative capabilities unimaginable before, many of them are AI-centric, barring many human creators without an in-depth understanding of these AI models from building effective communications between them and the systems, and utilizing both the expertise of their own and the AI models. My research focuses on Human-Aware Artificial Intelligence Procedural Content Generation (PCG), which centers on empowering creator-aware ways to carry out Procedural Content Generation tasks, enabling more creator-aware information exchange between a human creator and the AI, and abilities for the AI agent to adapt to the specific human creator while collaborating on the fly. In this dissertation, I begin with a discussion of what Computational Creativity means to the human-AI collaborative partnership by illustrating the diversity of Co-creative systems and sketching out the fundamentals of my work. I then present case studies of AI PCG systems utilizing both high-level and fine-grained control knobs with an awareness of the human creative process in mind. Developing on these studies, I cast the spotlight onto Creative-Wand, the toolbox I developed to explore the design space of interactions for Mixed-Initiative Co-Creative (MI-CC) systems, and the benefits of MI-CC systems covering larger portions of the design space. In light of these findings, I demonstrate that human-in-the-loop Reinforcement Learning (RL) can enable human awareness of MI-CC collaborative systems, going beyond controlled generation, learning collaborative delegations, and improving overall experiences.

Xiangyu (Becky) Peng. Controlling Behavior with Shared Knowledge. Ph.D. Dissertation, , 2024.

Controlling agent behavior is a fundamental challenge across diverse domains within artificial intelligence and robotics. The central idea of this dissertation is that shared knowledge can be used as a powerful tool to control AI agents’ behavior. This dissertation explores the utilization of shared knowledge in constructing coherent narratives and enhancing the expression of shared knowledge in Reinforcement Learning agents. In this dissertation, I first investigate the utilization of shared knowledge for constructing narratives by developing a story-generation agent that emulates the cognitive processes of how human readers create detailed mental models, referred to as the “reader model”, which they use to understand and interpret stories with shared knowledge. Employing the reader model has resulted in the generation of significantly more coherent and goal-directed stories. I also explore how to input unique constraints into the story generator allowing for the modification of the shared knowledge. Subsequently, I delve into the application of shared knowledge in controlling reinforcement learning agents through the introduction of a technique called “Story Shaping.” This technique involves the agent inferring tacit knowledge from an exemplar story and rewarding itself for actions that align with the inferred reader model. Following proposing this agent, I propose the Thespian agent to leverage the knowledge learned in this technique to adapt to the new environment under a few-shot setting. Additionally, I investigate the potential of using shared knowledge to explain behavior by examining the impact of symbolic knowledge graph-based state representation and Hierarchical Graph Attention mechanism on the decision-making process of a reinforcement learning agent. The goal of this dissertation aims to create AI-driven systems that are more coherent, controllable, and aligned with human expectations and preferences, thereby fostering trust and safety in human-AI interactions.

Sarah Wiegreffe. Interpreting Neural Networks for and with Natural Language. Ph.D. Dissertation, , 2022.

In the past decade, natural language processing (NLP) systems have come to be built almost exclusively on a backbone of large neural models. As the landscape of feasible tasks has widened due to the capabilities of these models, the space of applications has also widened to include subfields with real-world consequences, such as fact-checking, fake news detection, and medical decision support. The increasing size and nonlinearity of these models results in an opacity that hinders efforts by machine learning practitioners and lay-users alike to understand their internals and derive meaning or trust from their predictions. The fields of explainable artificial intelligence (XAI) and more specifically explainable NLP (ExNLP) have emerged as an active area for remedying this opacity and for ensuring models’ reliability and trustworthiness in high-stakes scenarios, by providing textual explanations meaningful to human users. Models that produce justifications for their individual predictions can be inspected for the purposes of debugging, quantifying bias and fairness, understanding model behavior, and ascertaining robustness and privacy. Textual explanation is a predominant form of explanation in machine learning datasets regardless of task modality. As such, this dissertation covers both explaining tasks with natural language and explaining natural language tasks. In this dissertation, I propose test suites for evaluating the quality of model explanations under two definitions of meaning: faithfulness and human acceptability. I use these evaluation methods to investigate the utility of two explanation forms and three model architectures. I finally propose two methods to improve explanation quality– one which increases the likelihood of faithful highlight explanations and one which improves the human acceptability of free-text explanations. This work strives to increase the likelihood of positive use and outcomes when AI systems are deployed in practice.

Prithviraj Ammanabrolu. Language Learning in Interactive Environments. Ph.D. Dissertation, Georgia Institute of Technology, 2021.

Natural language communication has long been considered a defining characteristic of human intelligence. I am motivated by the question of how learning agents can understand and generate contextually relevant natural language in service of achieving a goal. In pursuit of this objective, I have been studying Interactive Narratives, or text-adventures: simulations in which an agent interacts with the world purely through natural language—”seeing” and “acting upon” the world using textual descriptions and commands. These games are usually structured as puzzles or quests in which a player must complete a sequence of actions to succeed. My work studies two closely related aspects of Interactive Narratives: operating in these environments and creating them in addition to their intersection—each presenting its own set of unique challenges. Operating in these environments presents three challenges: (1) Knowledge representation—an agent must maintain a persistent memory of what it has learned through its experiences with a partially observable world; (2) Commonsense reasoning to endow the agent with priors on how to interact with the world around it; and (3) Scaling to effectively explore sparse-reward, combinatorially-sized natural language state-action spaces. On the other hand, creating these environments can be split into two complementary considerations: (1) World generation, or the problem of creating a world that defines the limits of the actions an agent can perform; and (2) Quest generation, i.e. defining actionable objectives grounded in a given world. I will present my work thus far—showcasing how structured, interpretable data representations in the form of knowledge graphs aid in each of these tasks—in addition to proposing how exactly these two aspects of Interactive Narratives can be combined to improve language learning and generalization across this board of challenges.

Kristin Siu. Design and Evaluation of Intelligent Reward Structures in Human Computation Games. Ph.D. Dissertation, Georgia Institute of Technology, 2021.

Despite the ubiquity of artificial intelligence, some problems and procedures— such as building commonsense knowledge understanding or generating creative works— have no or few effective algorithmic solutions, yet are considered straightforward for humans to solve. Human computation games (HCGs) are playful, game-based interfaces for tackling these problems through crowdsourcing. HCGs have been used to solve tasks that were and still are considered complex for computational algorithms such as image tagging, protein synthesis, 3D structure reconstruction, and creative artifact generation. However, despite these successes, HCGs have not seen broad adoption compared to other types of serious digital games. Among the many reasons for this lack of adoption is the reality that these games are typically not seen as engaging or compelling to play, as well as the fact that creating HCGs comes at a high development cost to task providers who are typically not game development experts. This thesis is a step towards building and establishing a more formalized design understanding of how to create HCGs that both provide a compelling player experience and complete the underlying task effectively.

In this thesis, I explore reward mechanics in HCGs. Reward mechanics are integral to HCGs due their associations with player motivation, compensation, and task validation. I first propose a framework for understanding HCG mechanics and advocate for an experimental methodology evaluating both player experience and task completion metrics to understand variations in HCG mechanics. I then use these tools to frame and design three experiments that explore small-scale variations of reward systems in HCGs: reward functions, reward distribution, and reward personalization. These studies demonstrate that even small variations in rewards (i.e., offering players the ability to choose the type of reward) may have significant positive effects on both player experience and task completion metrics. I also show that some variations (i.e., co-located, competitive reward scoring) may have both positive and negative tradeoffs across these metrics. Moreover, this work observes that existing, anecdotal design wisdom for HCGs may not always hold (i.e., allowing players to verbally collude actually predicts higher task solution accuracy). Altogether, this thesis demonstrates that certain aspects of reward systems in HCGs can be varied to improve the player experience without compromising task completion metrics, and builds more empirically-tested design knowledge for creating more engaging, effective HCGs.

Lara Martin. Neurosymbolic Automated Story Generation. Ph.D. Dissertation, Georgia Institute of Technology, 2020.

Although we are currently riding a technological wave of personal assistants, many of these agents still struggle to communicate appropriately. Humans are natural storytellers, so it would be fitting if artificial intelligence (AI) could tell stories as well. Automated story generation is an area of AI research that aims to create agents that tell good stories. With goodness being subjective and hard-to-define, I focus on the perceived coherence of stories in this thesis. Previous story generation systems use planning and symbolic representations to create new stories, but these systems require a vast amount of knowledge engineering. The stories created by these systems are coherent, but only a finite set of stories can be generated. In contrast, very large neural language models have recently made the headlines in the natural language processing community. Though impressive on the surface, even the most sophisticated of these models begins to lose coherence over time. My research looks at both neural and symbolic techniques of automated story generation. In this dissertation, I created automated story generation systems that improved coherence by leveraging various symbolic approaches for neural systems. I did this through a collection of techniques; by separating out semantic event generation from syntactic sentence generation, manipulating neural event generation to become goal-driven, improving syntactic sentence generation to be more interesting and coherent, and creating a rule-based infrastructure to aid neural networks in causal reasoning.

Matthew Guzdial. Combinational machine learning creativity. Ph.D. Dissertation, Georgia Institute of Technology, 2019.

Computational creativity is a field focused on the study and development of behaviors in computers an observer would deem creative. Traditionally, it has relied upon rules-based and search-based artificial intelligence. However these types of artificial intelligence rely on human-authored knowledge that can obfuscate whether creative behavior arose due to actions from an AI agent or its developer. In this dissertation I look to instead apply machine learning to a subset of computational creativity problems. This particular area of research is called combinational creativity. Combinational creativity is the type of creativity people employ when they create new knowledge by recombining elements of existing knowledge. This dissertation examines the problem of combining combinational creativity and machine learning in two primary domains: video game design and image classification. Towards the goal of creative novel video game designs I describe a machine-learning approach to learn a model of video game level design and rules from gameplay video, validating the accuracy of these with a human subject study and automated gameplaying agent, respectively. I then introduce a novel combinational creativity approach I call conceptual expansion, designed to work with machine-learned knowledge and models by default. I demonstrate conceptual expansion’s utility and limitations across both domains, through the creation of novel video games and applied in a transfer learning framework for image classification. This dissertation seeks to validate the following hypothesis: For creativity problems that require the combination of aspects of distinct examples, conceptual expansion of generative or evaluative models can create a greater range of artifacts or behaviors, with greater measures of value, surprise, and novelty than standard combinational approaches or approaches that do not explicitly model combination.

Alexander Zook. Automated Iterative Game Design. Ph.D. Dissertation, Georgia Institute of Technology, 2016.

Computational systems to model aspects of iterative game design were proposed, encompassing: game generation, sampling behaviors in a game, analyzing game behaviors for patterns, and iteratively altering a game design. Explicit models of the actions in games as planning operators allowed an intelligent system to reason about how actions and action sequences affect gameplay and to create new mechanics. Metrics to analyze differences in player strategies were presented and were able to identify flaws in game designs. An intelligent system learned design knowledge about gameplay and was able to reduce the number of design iterations needed during playtesting a game to achieve a design goal. Implications for how intelligent systems augment and automate human game design practices are discussed.

Hong Yu. A Data-Driven Approach for Personalized Drama Management. Ph.D. Dissertation, Georgia Institute of Technology, 2015.

An interactive narrative is a form of digital entertainment in which players can create or influence a dramatic storyline through actions, typically by assuming the role of a character in a fictional virtual world. The interactive narrative systems usually employ a drama manager (DM), an omniscient background agent that monitors the fictional world and determines what will happen next in the players’ story experience. Prevailing approaches to drama management choose successive story plot points based on a set of criteria given by the game designers. In other words, the DM is a surrogate for the game designers. In this dissertation, I create a data-driven personalized drama manager that takes into consideration players’ preferences. The personalized drama manager is capable of (1) modeling the players’ preference over successive plot points from the players’ feedback; (2) guiding the players towards selected plot points without sacrificing players’ agency; (3) choosing target successive plot points that simultaneously increase the player’s story preference ratings and the probability of the players selecting the plot points. To address the first problem, I develop a collaborative filtering algorithm that takes into account the specific sequence (or history) of experienced plot points when modeling players’ preferences for future plot points. Unlike the traditional collaborative filtering algorithms that make one-shot recommendations of complete story artifacts (e.g., books, movies), the collaborative filtering algorithm I develop is a sequential recommendation algorithm that makes every successive recommendation based on all previous recommendations. To address the second problem, I create a multi-option branching story graph that allows multiple options to point to each plot point. The personalized DM working in the multi-option branching story graph can influence the players to make choices that coincide with the trajectories selected by the DM, while gives the players the full agency to make any selection that leads to any plot point in their own judgement. To address the third problem, the personalized DM models the probability that the players transitioning to each full-length stories and selects target stories that achieve the highest expected preference ratings at every branching point in the story space. The personalized DM is implemented in an interactive narrative system built with choose-your-own-adventure stories. Human study results show that the personalized DM can achieve significantly higher preference ratings than non-personalized DMs or DMs with pre-defined player types, while preserve the players’ sense of agency.

Boyang Li. Learning Knowledge to Support Domain-Independent Narrative Intelligence. Ph.D. Dissertation, Georgia institute of Technology, 2014.

Narrative Intelligence is the ability to craft, tell, understand, and respond appropriately to narratives. It has been proposed as a vital component of machines aiming to understand human activities or to communicate effectively with humans. However, most existing systems purported to demonstrate Narrative Intelligence rely on manually authored knowledge structures that require extensive expert labor. These systems are constrained to operate in a few domains where knowledge has been provided. This dissertation investigates the learning of knowledge structures to support Narrative Intelligence in any domain. I propose and build a system that, from an corpus of simple exemplar stories, learns complex knowledge structures that subsequently enable the creation, telling, and understanding of narratives. The knowledge representation balances the complexity of learning and the richness of narrative applications, so that we can (1) learn the knowledge robustly in the presence of noise, (2) generate a large variety of highly coherent stories, (3) tell them in recognizably different narration styles and (4) understand stories efficiently. The accuracy and effectiveness of the system have been verified by a series of user studies and computational experiments. As a result, the system is able to demonstrate Narrative Intelligence in any domain where we can collect a small number of exemplar stories. This dissertation is the first step toward scaling computational narrative intelligence to meet the challenges of the real world.

Brian O’Neill. A Computational Model of Suspense for the Augmentation of Intelligent Story Generation. Ph.D. Dissertation, Georgia institute of Technology, 2013.

In this dissertation, I present Dramatis, a computational human behavior model of suspense based on Gerrig and Bernardo’s de nition of suspense. In this model, readers traverse a search space on behalf of the protagonist, searching for an escape from some oncoming negative outcome. As the quality or quantity of escapes available to the protagonist decreases, the level of suspense felt by the audience increases. The major components of Dramatis are a model of reader salience, used to determine what elements of the story are foregrounded in the reader’s mind, and an algorithm for determining the escape plan that a reader would perceive to be the most likely to succeed for the protagonist. I evaluate my model by comparing its ratings of suspense to the self-reported suspense ratings of human readers. Additionally, I demonstrate that the components of the suspense model are sufficient to produce these human-comparable ratings.

Mark O. Riedl. Narrative Generation: Balancing Plot and Character. Ph.D. Dissertation, North Carolina State University, 2004.

The ability to generate narrative is of importance to computer systems that wish to use story effectively for a wide range of contexts ranging from entertainment to training and education. The typical approach for incorporating narrative into a computer system is for system builders to script the narrative features at design time. A central limitation of this pre- scripting approach is its lack of flexibility – such systems cannot adapt the story to the user’s interests, preferences, or abilities. The alternative approach is for the computer systems themselves to generate narrative that is fully adapted to the user at run time.

A central challenge for systems that generate their own narrative elements is to create narratives that are readily understood as such by their users. I define two properties of narrative – plot coherence and character believability – which play a role in the success of a narrative in terms of the ability of the narrative’s audience to comprehend its structure. Plot coherence is the perception by the audience that the main events of a story have meaning and relevance to the outcome of the story. Character believability is the perception by the audience that the actions performed by characters are motivated by their beliefs, desires, and traits.

In this dissertation, I explore the use of search-based planning as a technique for generating stories that demonstrate both strong plot coherence and strong character believability. To that end, the dissertation makes three central contributions. First, I describe an extension to search-based planning that reasons about character intentions by identifying possible character goals that explain their actions in a plan and creates plan structure that explains why those characters commit to their goals. Second, I describe how a character personality model can be incorporated into planning in a way that guides the planner to choose consistent character behavior without strictly preventing characters from acting “out of character” when necessary. Finally, I present an open-world planning algorithm that extends the capabilities of conventional planning algorithms in order to support a process of story creation modeled after the process of dramatic authoring used by human authors. This open-world planning approach enables a story planner not only to search for a sequence of character actions to achieve a set of goals, but also to search for a possible world in which the story can effectively be set.

The planning algorithms presented in this dissertation are used within a narrative generation system called Fabulist. Fabulist generates a story as a sequence of character actions and then recounts the story by first generating a discourse plan that specifies how the story content should be told and then realizing the discourse plan in a storytelling medium. I present the results of an empirical evaluation that demonstrates that narratives generated by Fabulist have strong plot coherence and strong character believability. The results clearly indicate how a planning approach to narrative generation that reasons about plot coherence and character believability can improve the audience’s comprehension of plot and character.