● Proposal Stage

Interpreting Agent Behavior

Human-Centered Empirical Methods for Understanding Agents
COLM 2026 Workshop · October 9, 2026
Agent behaviors illustration

Motivation

As commercial agentic systems such as Claude Code and Codex see widespread real-world deployment in early 2026, their behaviors have grown increasingly complex. These systems autonomously plan, execute, and iterate on multi-step tasks, generating vast amounts of behavioral data. Yet current evaluation relies almost exclusively on automated benchmarks: pass/fail metrics that reveal whether an agent succeeded but not how it behaved, why it failed, or how humans made sense of its actions. Bridging this gap between outcome-based metrics and behavioral understanding is essential for building agents that are not only effective but also interpretable, debuggable, and trustworthy.

IAB aims to build a community that brings researchers across disciplines, including interpretability, evaluation, HCI, alignment, and computational social science. These researchers are actively developing analysis methods such as benchmarking, trace analysis, red-teaming, corpus analysis, error analysis, qualitative analysis, grounded theory, and think-aloud methods. IAB provides a venue to consolidate these efforts into shared vocabulary, open datasets, and reproducible methodology.

Scope

We plan to investigate agent behavior from three complementary perspectives.

How do agents behave?

  • How do agents make decisions, propose and revise plans, and select and use tools during multi-step tasks?
  • How do agents behave when they face ambiguity, uncertainty, or incomplete information?
  • How do agents fail, recover from errors, or enter failure cascades during real-world execution?
  • What unexpected or emergent behaviors appear in single-agent and multi-agent systems?
  • How do different model backbones or coordination structures shape observable agent behavior?

How do humans respond?

  • How do users write, adapt, and refine prompts while working with agents?
  • How do humans verify agent-generated outputs and decide when more checking is needed?
  • How do people calibrate trust in agent outputs, and when does over-reliance emerge?
  • How do users form and update mental models of what agents can and cannot do?
  • How do people interpret, monitor, and make sense of ongoing agent execution?

How do they interact?

  • How do humans and agents collaborate more effectively on complex, open-ended tasks?
  • How do humans communicate intent, constraints, and goals to agents across multi-turn interactions?
  • How do agents and humans negotiate misunderstandings, breakdowns, and repair during collaboration?
  • What patterns emerge from interaction traces, tool-use logs, and human-agent conversations over time?
  • How can we systematically analyze these interactions using empirical methods from HCI and the social sciences?

Speakers

We thank the current speakers who are interested in giving a talk.

Armando Solar-Lezama
MIT CSAIL
Program Synthesis · Confirmed
Diyi Yang
Stanford University
Human-centered NLP · Confirmed
Bowen Baker
OpenAI
Multi-Agent Systems · Confirmed
Graham Neubig
Carnegie Mellon University
Language Agents · Tentative

Schedule

Full-day workshop with keynotes, paper presentations, posters, and a panel discussion.

09:00 – 09:10Opening Remarks
09:10 – 09:45Keynote: Armando Solar-Lezama (30 min + 5 min Q&A)
09:45 – 10:20Keynote: Diyi Yang (30 min + 5 min Q&A)
10:20 – 10:50Paper Presentations (2 × 15 min)
10:50 – 11:40Poster Session #1 + Coffee Break
11:40 – 12:10Paper Presentations (2 × 15 min)
12:10 – 13:10Lunch
13:10 – 13:45Keynote: Bowen Baker (30 min + 5 min Q&A)
13:45 – 15:00Paper Presentations (5 × 15 min)
15:00 – 15:50Poster Session #2 + Coffee Break
15:50 – 16:35Panel: Armando Solar-Lezama, Diyi Yang, Bowen Baker, Graham NeubigEmpirical Methods for Understanding Agent Behavior
16:35 – 16:50Best Paper Award + Closing Remarks

Call for Papers

We solicit two types of non-archival submissions and welcome empirical studies, datasets, methods papers, tools, and negative results on understanding agent behavior.

We particularly welcome contributions across four categories:

We also encourage negative results and methodological position papers.

Long Papers

Up to 9 pages + references. For full empirical studies, datasets, benchmarks, or comprehensive analyses.

Short Papers

Up to 4 pages + references. For position papers, tools, demos, preliminary findings, and negative results.

Review Process

COLM-style formatting, double-anonymous review, and at least 2 reviews per submission via OpenReview.

Submission
Jun 23, 2026
Notification
Jul 24, 2026
Workshop
Oct 9, 2026
Format
Non-archival

Organizing Committee

Jie (Sophia) Gao
Johns Hopkins University
Human-AI Collaboration
Kaiser Sun
Johns Hopkins University
LLM Interpretability
Teresa Yeo
Google DeepMind
Model Robustness
Daniel Khashabi
Johns Hopkins University
Reliable Language AI
Zhuoran Lu
Accenture
Human-AI Decision Making
Boyuan Zheng
xAI
Web Agents & Safety
Katherine Van Koevering
Johns Hopkins University
Computational Social Science
Sijie Ji
Caltech
Physical AI & CPS
Jen-tse Huang
Johns Hopkins University
LLM Evaluation

Advisory Board

We thank the faculty and senior researchers who have provided guidance and support for this workshop.

Ziang Xiao (JHU) · Soufiane Hayou (JHU) · Toby Jia-Jun Li (Notre Dame) · Hang Jiang (Northeastern) · Weiyan Shi (Northeastern) · Wei Lu (Nanyang Technological University, Singapore) · Samuel Nathanson (xAI)

Sponsors

To be announced