Understanding an unfamiliar codebase is an essential task for developers in various scenarios, such as during the onboarding process. Especially when the codebase is large and time is limited, achieving a decent level of comprehension remains challenging for both experienced and novice developers, even with the assistance of large language models (LLMs). Existing studies have shown that LLMs often fail to support users in understanding code structures or to provide user-centered, adaptive, and dynamic assistance in real-world settings.
To address this, we propose learning from the perspective of a unique role, code auditors, whose work often requires them to quickly familiarize themselves with new code projects on a weekly or even daily basis. We recruited and interviewed 8 code auditing practitioners to understand how they master codebase understanding. We identified four design opportunities for an LLM-based codebase understanding system: supporting cognitive alignment through automated codebase information extraction, decomposition, and representation, as well as reducing manual effort and conversational distraction through interaction design.
To validate these four design opportunities, we designed a system prototype, CodeMap, that provides dynamic information extraction and representation aligned with the human cognitive flow and enables interactive switching among hierarchical codebase visualizations. We conducted a user study with nine experienced developers and six novice developers. Our results demonstrate that CodeMap significantly improved users' perceived intuitiveness, ease of use, and usefulness in supporting code comprehension, while reducing their reliance on reading and interpreting LLM responses by 79% and increasing map usage time by 90% compared to the static visualization analysis tool.
RQ1. How do auditors cognitively navigate and understand an unfamiliar codebase? What information are they seeking during the process?
RQ2. What types of assistance do they seek from ChatGPT during the process? What challenges do they encounter? What suggestions do they have?
RQ3. How does our expert-inspired system affect experienced and novice developers' perceived usefulness, intuitiveness, ease of use, and others?
RQ4. How are experienced and novice developers' system usage behaviors affected, and why?
Semi-structured interviews with 8 code auditors to understand cognitive navigation strategies and ChatGPT usage patterns during code understanding.
Identified key cognitive support needs: information extraction, decomposition, representation, and decreased manual effort through interaction.
Built CodeMap with dynamic information extraction, visualization via interactive dot graphs, and natural language interaction powered by RAG + LLM.
Summative user studies with 9 experienced and 6 novice developers comparing CodeMap + VSCode, ChatGPT + VSCode, and Understand + VSCode.
CodeMap provides an interactive interface for exploring codebase structure through visualization and natural language.
CodeMap consists of three core components working together: a Prompt Generator that assembles task-specific prompts using templates, a Retrieval Augmented Generation (RAG) System with a vector store and LLM that produces multi-level responses (global, local, and detail levels), and a Map Interaction Client featuring a graph viewer, textual explanation viewer, and chatbot for interactive exploration.
CodeMap uses carefully designed prompt templates that decompose complex code understanding tasks into step-by-step instructions. Each prompt includes a task description, content and format requirements, and few-shot examples with generic placeholders that work across languages and projects. The system outputs both dot-format graphs for visualization and JSON-format structured textual explanations.
We conducted comprehensive user studies to validate CodeMap. In the summative evaluation, 9 experienced developers and 6 novice developers completed three code understanding tasks using different tool configurations: CodeMap + VSCode, ChatGPT + VSCode, and Understand + VSCode. Each participant spent approximately 1.5 to 2 hours, with each task taking around 25 minutes followed by an interview and questionnaire.