Towards Trustworthy AI: Computational Design Science Methods for AI Security and AI Alignment
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Embargo
Dissertation not available (per author's request)Abstract
Artificial Intelligence (AI), particularly generative AI, has transformed many aspects of daily life. However, these advancements also bring significant challenges such as security vulnerabilities, fairness concerns, and alignment issues. On one hand, adversaries can exploit vulnerabilities in AI-enabled information systems (IS) (e.g., recommender systems) for economic gain or to undermine social fairness. On the other hand, it is crucial to align advanced AI systems with human preferences and values. Yet, existing AI models, especially those based on deep learning, often require large amounts of data for effective training, making it difficult to align with human preferences from small data. Furthermore, although generative AI (e.g., Large Language Model (LLM)) has achieved remarkable success in a range of tasks, these general-purpose models often lack domain-specific knowledge, which may not align well with domain-specific human preferences in real-world applications.This dissertation presents five essays that developed novel computational design science methods to enhance the trustworthiness of AI, especially AI security and AI alignment. The essay I develops a novel multi-agent reinforcement learning method to study the vulnerability of AI-enabled information systems under performance attack and further proposes a vulnerability estimator for mitigation practice. The essay II proposes a new reinforcement learning method with graph neural network design to investigate the vulnerability of AI-enabled information systems under fairness attack. The essay III designs a novel nested optimization method to secure the fairness of AI-enabled information systems. The essay IV develops a novel context-aware offline meta-level model-based reinforcement learning method to align AI with human preference from small data. The essay V leverages information systems theory to design a novel LLM finetuning method to align generative AI with domain-specific human preferences. All five essays offer valuable practical implications for cybersecurity analysts, consumers, business owners, and information systems designers. Beyond the practical implications, this dissertation can provide numerous design principles to guide future trustworthy IS research.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeManagement Information Systems