Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
To develop AI systems capable of assisting scientific discovery, it is essential for them to possess strong scientific reasoning abilities. While traditional scientific reasoning has been extensively studied through scientific question answering, modern large language models (LLMs) primarily excel in this task by memorizing declarative scientific knowledge during training. However, their ability to tackle interactive scientific tasks requiring procedural knowledge remains underexplored. This dissertation addresses this gap by leveraging text-based simulations as benchmarks to evaluate LLMs’ scientific reasoning in interactive environments. In Chapter 2, experimental results first demonstrate that although LLMs can pass scientific question-answering tests, they struggle with interactive scientific tasks. To address this issue, this research contributes to both agent development and environment development. For agent development, in Chapter 3, this work enhances agent performance by introducing a neurosymbolic tool-using strategy that enables models to tackle tasks where neural networks typically struggle but can be solved with simple algorithmic solutions. Additionally, in Chapter 4, to mitigate the challenge of limited human-annotated training data, a self-supervised method is proposed to automatically train behavior cloning agents. On the environment development side, manually created text-based simulators struggle to scale and accommodate the diverse needs of scientific tasks. To address this limitation, in Chapter 5 and Chapter 6, this dissertation explores two automatic environment construction methods: generating code for a simulator and directly functioning as a simulator by predicting the next state based on the current state and agent actions. Experimental results indicate that while these approaches show promise, LLMs still face significant challenges in serving as reliable world simulators, underscoring the need for further advancements in this area.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeInformation