Developing agents that can reliably act on our behalf is central to artificial intelligence (AI). These agents must seamlessly interact with tools, like search engines and databases, and collaborate. In this thesis, we study the abstractions, methods, and infrastructure needed to enable and support the development of AI agents in the era of large language models (LLMs). The contributions of the thesis are divided into four parts.
Part 1 examines goal-oriented collaboration between two components, at least one of which is LLM-based. For an LLM-based component to interact successfully with others, it must adhere to specified interfaces, especially when interacting with traditional software-based components exposed through an API, and steer the collaboration toward high-utility outcomes. We show that LLM decoding algorithms serve as an efficient strategy to accomplish both objectives without modifying the underlying model.
Part 2 focuses on scenarios where the underlying model's capabilities are insufficient for effective collaboration, and the training signal necessary for improving the model is not readily available. To address this challenge, we introduce the principle of exploiting asymmetry for synthetic data generation and demonstrate how it can be applied to generate useful data even for tasks that LLMs cannot solve directly. We highlight the generality of this approach by drawing connections to seminal work on self-improvement for LLMs.
Part 3 addresses the collaboration among multiple AI systems, tools, and humans. We propose an abstraction that, in concert with the accompanying library, provides a theoretical and practical infrastructure with a modular and concurrency-friendly design, which enables the modeling, implementation, and systematic study of arbitrarily complex structured interactions. To demonstrate the potential of the framework and the accompanying library, we use them to systematically investigate the benefits of complex interactions for solving competitive coding problems.
Part 4 proposes a novel perspective called semantic decoding that allows us to systematically study the design space of structured interactions. We conclude this part by discussing the research opportunities and questions emerging from the semantic decoding perspective, enabled by the foundation laid in Parts 1, 2, and 3.
EFPL_TH10872.pdf
Main Document
openaccess
N/A
7.56 MB
Adobe PDF
f58d5f6b890d9bb43f44db21a9732047