Is Multimodal Processing Essential for Theory of Mind? Insights from Non-Human Animals and AI

May 15, 2025·
Luca Di Vincenzo, Ph.D.
Luca Di Vincenzo, Ph.D.
· 2 min read
Date
May 15, 2025 — May 21, 2025
Event
Location

University of Porto

Praça Gomes Teixeira, Porto,

Inferring Theory of Mind (ToM) in non-human animals presents a significant challenge due to the absence of linguistic reports. Traditional approaches rely on behavioral paradigms that may be interpreted through alternative, non-mentalistic explanations. To address this, I explore multimodal shift—the ability to switch communication channels when the current one is disrupted—as a potential indirect indicator of ToM. Unlike traditional ToM tests based on false belief tasks, multimodal shifts offer a more ecologically valid approach, as they emerge in spontaneous communicative interactions rather than structured experimental setups.

I suggest that a multimodal shift exhibiting specific characteristics—not being an invariant response to a stimulus, relying on redundant and free signals, occurring only in the presence of a perceiver, and involving both production and perceptual aspects of communication—may necessitate meta-representational capacities. If non-human animals demonstrate such abilities, this would suggest that ToM does not require language but rather a sufficient degree of multimodal cognitive flexibility.

This insight has philosophical and computational implications. First, it challenges human-centric models of ToM, reinforcing the idea that mental state attribution is a gradual, distributed trait across species rather than a uniquely human faculty. Second, it provides a framework for evaluating ToM-like capacities in AI and computational models, positing that ToM is not intrinsic to language but a product of multimodal cognition. Unlike non-human animals, current AI architectures, such as large language models (LLMs), rely on unimodal processing and lack adaptive sensory integration, making them fundamentally limited in simulating genuine mind attribution. However, multimodal AI models designed to integrate and flexibly adapt across sensory modalities could approximate higher-order cognition more effectively. If ToM requires the ability to dynamically process multimodal information, then replicating this structure in AI may offer novel insights into the mechanisms underlying human mental state attribution.