Mechanistic Indicators of Understanding in Large Language Models
arXiv. With Pierre Beckmann. doi:10.48550/arXiv.2507.08017
Draws on detailed technical evidence from research on mechanistic interpretability (MI) to argue that while LLMs differ profoundly from human cognition, they do more than tally up word co-occurrences: they form internal structures that are fruitfully compared to different forms of human understanding, such as conceptual, factual, and principled understanding. We synthesize MI’s most relevant findings to date while embedding them within an integrative theoretical framework for thinking about understanding in LLMs. As the phenomenon of “parallel mechanisms” shows, however, the differences between LLMs and human cognition are as philosophically fruitful to consider as the similarities.
explainable AI, LLM, mechanistic interpretability, philosophy of AI, understanding, conceptual change
Download PDF