arXiv4h ago

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt, Sarah Schwettmann

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty9/10

Categorytool

Topics

interpretabilityllmcircuitsresearch

Opportunity Brief

Build an automated interpreter for LLM attribution graphs. This moves circuit tracing away from manual inspection into an automated pipeline, allowing researchers to quickly find out *why* a model generated a specific output.

Suggested repo: trace-viz

"Explain LLM computations, automatically."

Estimated effort: 100h