Artificial General Intelligence (AGI) is typically understood as (future) systems that match or exceed human-level intelligence – at least, that is the "official" scientific and popular science narrative being shared with the public (see, for instance, this article from Google DeepMind). I am not particularly fond of this definition, but let’s stick to it for now – we will return to its alternatives later. The core properties of AGI generally include universality (the ability to handle problems across diverse domains), cognitive flexibility (the capacity to transfer abstracted patterns from familiar situations to novel ones), self-directed learning from the system’s own experience, and independence in planning multi-step strategies to solve complex, real-world tasks. We are still far from a fully realized AGI; however, the progress toward it is advancing with impressive speed, scale, and engineering ingenuity. Below, I will try to show how this race relates to the main Ergo Mentis concepts regarding the human mind and consciousness: the Umezawa–Vitiello Quantum Model of the Brain (QMB) and my own hypothesis of B Objects (external “images” and “carrier-states” of human intellect).
DISCLAIMER: This text does not claim to be a comprehensive review of the current state of AI. My goal is simply to provide "illustrative examples" relevant as of January 2026.
We will focus exclusively on “digital” AI systems – such as the widely known ChatGPT, Gemini, Grok, and others – implemented on traditional silicon microelectronics, without addressing computer-biological hybrids. To begin, let us establish several initial analogies:
From an “engineering” perspective, both AI and the human brain represent vast networks of interconnected elements that exhibit complex dynamics while processing input signals. It’s important to note that a user’s external request (prompt) is not utilized by an AI model in its original "human" form. First, it is translated into the system’s internal language, ranging from tokens (small text fragments) and vectors (numerical “semantic imprints”) to the internal states of the network itself, which is typically implemented as a multilayer Transformer (a specific type of neural network architecture). Thus, just as a living organism’s nervous system converts receptor stimulation into a sequence of nerve impulses delivered to the brain, an AI model transforms the prompt context into internal data structures that trigger the main computational processes. The user request serves as an analog of an external stimulus, while the internal dynamics of the neural network functionally correspond to the "firing" patterns of neurons in the brain. Furthermore, in both the Quantum Model of the Brain and modern AI, the processing of an incoming signal involves interaction with accumulated “cognitive experience.” This interaction can be initiated either by the signal itself or by a spontaneously generated internal state within the system.
In a functional sense, the Quantum Model of the Brain describes how, in an intellectual “agent,” memorization, long-term storage, and reactivation – encompassing both facts about the external world and one’s own cognitive states – may be implemented. It is fundamentally important that, within the framework of the QMB, macroscopic neural groups do not serve as "memory cells" themselves. Instead, they act as triggers for micro-scale processes that form stable "records" – quantum condensates that "encode" our thoughts and experiences. Accessing these "records" is not an "extraction of a file," but rather a specific type of resonance. When a trigger appears that resembles the one that originally created the "record," the quantum "code" is reactivated. This assists the brain in instantly returning to the appropriate dynamic mode – a complex correlation of activity across distributed regions of the neocortex. Subjectively, this is experienced as a recollection, recognition, or a "surfacing" thought.
Triggers can be categorized into two types based on their causal origin:
- External (stimulus-dependent): The signal originates from receptors (a flash of light, a familiar voice, or a distinct scent). The external environment initiates a reconfiguration of the system’s state, "nudging" it toward a comparison with an existing "archival" code.
- Internal (spontaneous): The signal arises from the brain's own dynamics. This is a result of "self-excitation": the current train of thought, emotional state, or internal fluctuations bring the system into a configuration that resonates with a previously formed condensate. In this instance, the brain functions as an autonomous generator of queries to itself, acting as the initiator of memory retrieval.
Let's also note that it is not only the nature of the signal that matters, but also the coupling density – the degree of influence with which an emerging trigger redirects the brain’s current state toward past experience. For instance, an external stimulus might be fleeting and evoke only a minimal response, whereas internal resonance is capable of completely overriding the current cognitive mode, leading to prolonged reflection, a persistent memory, or an insight.
The boundary between external and internal triggers is not always distinct: external stimuli can initiate long chains of internal "thought cascades," while the brain's own states can provoke a search for external signals. This is quite evident in AI as well, where memory activation mechanisms often combine both types of triggers – as we will see below.
The specific relevance of the QMB to AI can be stated as follows: during the “life” and “intellectual development” of an AI system, in addition to periodically updating its parameters, we can store “encoded memory fragments” and reactivate them when signals (external or internal) similar to the original code appear. This intuitively suggests a solution to two obstacles on the path to AGI: catastrophic forgetting – since new knowledge fragments are added without interfering with existing ones – and the necessity for frequent global retraining. This logic is not merely consistent with modern AI engineering – it forms the basis of an entire class of advanced approaches that use large repositories of data and “cognitive experiences” from which the model retrieves relevant items during its operation. Furthermore, analogous to the QMB, two distinct types of signals can be identified as triggers for such retrieval:
- Signals from External Input/Context: “Keys” to memory fragments are generated based on information received from the outside – precisely like the “external” triggering prompted by receptor impulses.
- Signals from the Neural Network Itself: The “keys” are the internal states of the AI model – specifically its primary “intellectual” component responsible for generating the response. This is an analog of “internal” triggering, provoked by a familiar thought.
Note: the fundamental difference here lies not in the memory “carrier” itself (which in both cases could be the same vector database), but rather, as with the QMB, in the source of the signal that activates the retrieval – the external environment (the user query) or the system’s own internal dynamics.
The boundary between the two types of signals in AI systems is quite fuzzy. Modern architectures that utilize “external” triggering employ iterative memory access: information retrieval – an inference step – completeness and consistency evaluation – an expanded query reformulation – another retrieval – and so on. As a result, interaction with external memory is increasingly guided by the model’s reasoning. The primary neural network is gradually drawn into managing this interaction – transitioning from the initial external prompt to a series of clarifying queries that are internal in both origin and formulation. Yet, the access “code” still represents the current context (simplistically, the initial or expanded prompt plus the partially generated response), rather than the dynamics of the neural network itself (for instance, a set of hidden representations from some layers of the Transformer).
Conversely, internal triggering can transition into the external one if, during the reasoning process, the system detects a "cognitive problem": low confidence levels, data conflicts, a lack of progress toward a goal, and so on. For example, if the model lacks specific facts for successful inference, it may initiate a memory search itself – formulating a query to an external source not in terms of its hidden states (internal triggers), but in the language of the source, similar to a human user’s search query. However, the decision to search and the initial "formulation" of the problem were essentially internal, arising from the dynamics of the primary network.
Beyond the origin and descriptive language of a query, several other "coordinates" can be introduced to differentiate between internal and external triggers in AI:
- Associativity / Causality: External triggering is often purely associative (the prompt resembles a certain document or is simply a part of it), whereas internal triggering almost always reflects a causal relationship: the system seeks not merely what is “similar,” but what is logically required to complete the current chain of reasoning.
- Properties / States: External triggers are typically characterized by the static features of a query (such as keywords). Internal triggers, however, represent a resonance of states: they are defined not only by what the model is thinking about, but also by how it is thinking at that specific moment – at a particular dynamic trajectory within the neural network’s internal representation space.
And, of course, as with the QMB, besides the nature of the triggering signal, great importance pertains to coupling density – the degree to which a retrieved memory fragment is integrated into the process of generating a response. A new fact found in a database during the processing of a user query (external triggering) may act as a mere "reference sheet" attached to the prompt, barely altering the reasoning logic. On the other hand, a resonance between the network’s current state and a past reasoning trace stored in an external database can radically reconfigure the entire cognitive strategy.
Overall, in modern AI, the line between "external" and "internal" activation of memory fragments is not sharp, but forms a continuous spectrum in which both types may be present to varying degrees. The further AI progresses toward AGI, the more persistently (and diversely) it "learns" to refine its thinking process in order to: (1) generate what is requested; (2) do so with maximum efficiency; and (3) enrich itself with new cognitive experiences.
Now, let's examine several specific generative AI architectures that illustrate the points above.
kNN-LM (k-Nearest Neighbors Language Model) [1] is a prime example of "internal triggering" – that is, the activation of memory based on the system’s internal configuration during computational ("cognitive") work. In the classic kNN-LM variant, in addition to a pre-trained neural network, an external "repository of experience" is used – a vast collection of pairs in the form: "internal state of the model -> correct continuation." Here, the internal state (simplistically) refers to a hidden state vector of one of the Transformer’s upper layers when the current context is being processed. For each such state, the system stores what came next (for example, the next generated token).
At each step of the generation process, the model forms a set of hidden representations – essentially a pattern of internal activity. The "experience store" is then searched for the most similar past internal states to find continuations typically followed in similar situations. Consequently, the final probability distribution for the next step is calculated as a superposition of the outputs produced by the neural network in real-time and those retrieved from external memory. In the language of analogies with the QMB, this appears as: "internal dynamic mode –> reactivation of cognitive code –> return to a similar thought continuation."
The strength of kNN-LM is clear: at every step, an attempt is made to restore a cognitive regime – a set of parameters and contextual associations – that previously led to a successful outcome. In a practical sense, this resembles "recognizing a familiar task": the system does not begin from scratch; instead, it reaches proven continuation patterns more quickly, avoiding erroneous trajectories and thereby conserving computational effort. Moreover, response accuracy improves without modifying the network's weight coefficients. New internal knowledge is added – including during fine-tuning or adaptation to other domains – as new entries in the external store, which functions not as a passive archive, but as an active navigator of the inference process.
All of this enhances the stability of inference and prevents "drifting" toward the most probable (based on frequency) but inappropriate continuations – which, in particular, helps address the "long-tail" problem: if the model has previously processed something similar, external memory allows it to preserve that line of processing, even if that context lies far on the periphery of the statistical landscape (in the "tail" of the probability distribution). This applies to rare facts, highly specialized technical terms and phrases, unusually structured sentences (such as in literary prose or poetry), and peculiar combinations of disparate concepts (for instance, in non-trivial philosophy). By "compressing" training data into its weights, a standard generative model becomes "accustomed" to what it encounters most frequently. Consequently, during operation, it finds it difficult to deviate from dominant probabilities – that is, from the most common, and often more primitive, textual constructions.
Furthermore, another significant advantage of kNN-LM is its interpretability. Unlike the "black box" of a purely parametric model, kNN-LM allows us to observe exactly which memory fragments influenced any part of the response generation. As a result, we can construct a statistical-associative "reasoning path," which substantially increases trust in the final output.
However, kNN-LM also faces significant limitations. The main challenges are scalability and cost: the external storage becomes massive, and the search for relevant entries is computationally expensive and time-consuming. Another critical drawback is false matches – proximity in vector space does not guarantee semantic relevance. Occasionally, the system retrieves data that is "similar but incorrect," which can lead it down incorrect reasoning paths. Finally, during inevitable model updates, older "memory keys" (vectors/embeddings) may become incompatible with new ones – a phenomenon known as representation drift.
All of this explains why the kNN-LM approach, while conceptually robust, has not yet become a universal "AGI engine." Nevertheless, interest in it remains high – it continues to attract active discussion in both engineering and scientific circles [2, 3]. Enhanced and more efficient variants are also emerging; rather than "retrieving everything always," they ffollow a logic of "retrieving only the highest-quality entries and/or only when the system faces difficulty":
Adaptive Retrieval: "Only if I am uncertain." The most direct way to increase speed is to avoid accessing memory when the primary neural network is already performing well [4]. To achieve this, a system component is added to estimate confidence at each step. If the model is "confident in its own capacity," the external storage is not used.
Pruning and Compression of Stored Information: Instead of storing all vectors, it has been proposed [5] to remove those that barely alter the probability distribution during inference or that duplicate neighboring entries.
Clustering: "A card catalog instead of an ocean of data." Another method to accelerate memory retrieval is to narrow the search space. In [6], vectors are grouped into clusters; the model first selects a relevant cluster and then searches for the required fragments within it.
These methodologies can dramatically reduce computational costs with virtually no loss in quality – at least in the majority of cases..
[1] Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis (2020). "Generalization through Memorization: Nearest Neighbor Language Models". ICLR 2020; arXiv:1911.00172
[2] Shangyi Geng, Wenting Zhao, Alexander M. Rush (2025). "Great Memory, Shallow Reasoning: Limits of kNN-LMs". NAACL 2025; also on arXiv:2408.11815.
[3] Yuto Nishida et al (2025). "Long-Tail Crisis in Nearest Neighbor Language Models". Findings of NAACL 2025; arXiv:2503.22426
[4] Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick (2021). "Efficient Nearest Neighbor Language Models". EMNLP 2021.
[5] Xin Zheng, Zhirui Zhang, Junliang Guo, Shujian Huang, Boxing Chen, Weihua Luo, Jiajun Chen (2021). "Adaptive Nearest Neighbor Machine Translation". ACL-IJCNLP 2021, Short Papers.
[6] Dexin Wang, Kai Fan, Boxing Chen, Deyi Xiong. (2022) "Efficient Cluster-Based k-Nearest-Neighbor Machine Translation." ACL 2022, Long Papers.
Let’s now consider an approach that is closer to "external triggers," which, by analogy with the QMB, correspond to signals from receptors. RAG (Retrieval-Augmented Generation) [7] is a broad class of architectures unified by a simple concept: augmenting the user’s query with relevant fragments from an external source (such as a document corpus, knowledge base, web archive, or corporate manuals) and generating a response based on this expanded context as a whole.
As noted earlier, in AI models – including RAG systems – searches within an external database are typically conducted not using the words of the query itself, but by its internal representations (embeddings). These are generated by a separate retriever network, which may be closely linked to the primary neural network but is usually distinct from it. In other words, the retrieval of supplemental information occurs not via the "query as understood by the user," but via the "query as understood by the system": the external "stimulus" is translated into an internal language – specifically, the retriever’s vector representation. However, the trigger here still has an external, rather than internal, origin: it is the user’s query – even when rephrased into the system’s representations – that determines what will be retrieved from memory.
RAG’s strengths are especially important for "practical AGI." First, it offers direct updateability of knowledge (in contrast to the "system-mediated" updateability seen in kNN-LM): to ensure the system has new information, one simply needs to "refresh" the external database – for example, by adding new documents. Second, it provides source attribution: the system can display excerpts and citations, thereby increasing trust and verifiability. Third, it partially addresses "catastrophic forgetting": knowledge exists independently; the primary neural network becomes a tool for memory navigation rather than its sole – and sometimes not even its primary – carrier. Consequently, RAG has become the standard in fields where relevance and evidence are essential – ranging from corporate assistants to technical reference systems.
However, it also has its specific weaknesses. Some of these pertain to information retrieval: as with kNN-LM, if the system chooses the wrong fragments, generation will proceed from an improperly expanded query. Furthermore, the retrieved material must be "squeezed" into a limited context window, which means the information is filtered and compressed – a process where data loss is possible. Other disadvantages concern the generation process itself. By adding information from an external source, RAG does not guarantee a deep understanding of it – that is, it may fail to solve the complex task of selecting, reconciling, and semantically "assembling" fragments (which may be numerous, incomplete, or even contradictory). To avoid excessive "cognitive load," the model often seeks simpler paths – for instance, creating a superficial, persuasive-sounding "collage" or a smoothed-over compilation of text, rather than a causal reconstruction.
There is also the problem of "weak internal coupling" – the semantic links between response generation and external knowledge retrieval. This arises when the system needs to access an external database during the reasoning process itself, rather than via an initial user query. The root cause is that the retriever and the generator are separate neural networks. Retrievers are "tuned" for text queries, whereas the generator's hidden states exist in a different vector space and are optimized for token prediction rather than document retrieval. In the classic RAG design, supplementary information is retrieved once, at the beginning of the response generation. If the model identifies contradictions or "blank spots" at later stages and seeks assistance from its external database, it generally cannot query the retriever directly. Unlike in kNN-LM, the generator's hidden states, which describe its internal cognitive dynamics, do not serve as usable "keys" for the retriever. To initiate a search, a "translation" from one internal language to another is required – a process prone to significant quality loss.
Efforts to mitigate these losses involve aligning the vector representations of the retriever and generator or even unifying them into a single network. While this is not always implemented effectively, as previously noted, modern RAG models are moving away from the "retrieve-once-at-the-start" concept. Instead, they access external sources multiple times throughout the inference process, increasingly involving the primary neural network in generating query triggers. Thus, memory triggering is gradually becoming “internal” – in all the ways described earlier.
In this regard, the RETRO (Retrieval-Enhanced Transformer) architecture [8] implements a more radical idea: the use of external data becomes an integral part of the AI system's reasoning mechanism. In other words, the system is trained and operates on the assumption that it always has access to external memory, which it consults not only sporadically (when facing difficulties), but structurally and regularly.
RETRO models also use an external information corpus, but the logic for accessing it is different. The context the model works with (the current version of the response to the user query) is periodically split into segments, and for each of them, "nearest neighbors" (statistically similar fragments) are selected from external memory. These retrieved "neighbors" are then fed into the primary neural network not merely as appended text, but as a distinct information channel. The system sees them independently of the current context and, at each step, determines where to focus its attention – on the primary context or on the retrieved auxiliary fragment. This matters because, when a supplemental fragment is simply inserted into a query, it competes for the system’s attention mechanism alongside the rest of the text; consequently, the neural network may partially ignore it or "mistake" it for noise. When it appears as a separate attention path, its influence becomes more reliable: the model effectively develops an inherent habit of consulting external memory, rather than doing so "only if it notices."
The strength of RETRO lies in its more consistent application of the principle that "not everything should be stored within the primary neural network’s weight coefficients" than in RAG. In RAG systems, the generator network remains essentially self-contained: fragments from the external database are treated as occasional “helpers” that may or may not be provided. In RETRO, the model is structurally designed to rely on the presence of memory; the retrieval of external data becomes part of its normal "blood circulation." As a result, the neural network can have far fewer "neurons" while compensating through constant access to a vast external information corpus. This is an economically attractive path: enhancing the systemic core's navigation intelligence rather than infinitely increasing its parameter count.
However, RETRO also faces specific limitations [9]. First, the infrastructural complexity is higher: it requires indexing across huge datasets, ultra-fast search capabilities, and rigorous relevance control for "neighbors." Furthermore, the dependency on external fragments – which the model now treats as part of its standard computational chain – becomes more pronounced, and retrieval errors can propagate deeper into the generation process. The quality of the results depends not only on the neural network itself, but also on the specific fragments that enter its field of attention; for instance, retrieved "neighbors" might be statistically similar but causally incorrect. Finally, RETRO, like RAG, remains predominantly a mechanism of external triggering: the selection of supplemental information is determined by the current context (the initial query plus the already generated portion of the response) rather than by the system’s internal "cognitive state." It is performed on a "schedule" (every N tokens) rather than in response to an internal signal such as "I require assistance during the inference process." Both RAG and RETRO, unlike kNN-LM, do not restore the dynamic reasoning mode – they do not attempt to return the system to the same "thinking regime" that previously led – and could now lead again – to a correct conclusion, insight, or discovery.
[7] Lewis et al (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". NeurIPS 2020; arXiv:2005.11401
[8] Borgeaud et al (2021/2022). "Improving Language Models by Retrieving from Trillions of Tokens". arXiv:2112.04426; ICML 2022 PMLR.
[9] Li et al (2025). "A Survey of RAG-Reasoning Systems in LLMs". Findings of EMNLP 2025.
The examples above concern AI systems operating exclusively on text. Let’s now widen our scope and consider two of the most significant trends in current AI practice – multimodality and "world models." These are not merely additional chatbot features; they represent a paradigm shift in how AI perceives reality and manipulates knowledge about it. Currently, these two directions constitute the most substantial steps toward AGI – and they make the separation between dynamic "neural" reasoning and stable memory not only useful, but practically inevitable. This is because a "reasoning" neural network cannot, by itself, encapsulate the entirety of the vast and expanding world with which it must interact.
Multimodality [10, 11, 12] involves utilizing diverse types of information received through multiple sensory channels – vision, hearing, tactile sensations, spatial orientation... – to construct a cohesive picture of the environment. As a result, the textually abstracted "symbolic" logic of AI becomes "grounded" in reality as we understand it. In purely textual systems, memory resembles a library of notes; multimodality transforms it into a multi-layered archive of impressions. For instance, an AI model begins to understand that the word "heavy" is not simply a statistical neighbor of the word "weight," but a specific physical property that, among other things, limits movement capabilities. The phrase "Catholic cathedral" evolves into a complex concept encompassing the building's architecture, the specific echo of organ chords, the play of light through stained-glass windows, and so on. All of this requires the system to recognize "the same thing" across a vast array of different attributes. Memory ceases to be a textual archive and becomes an organized repository of multidimensional experience, while "text-based" search evolves into associative retrieval.
With purely textual models, we could still imagine that all knowledge is encoded within the neural network’s weight coefficients – as distributed linguistic-semantic patterns – and for many tasks, this is sufficient. Multimodality, however, shatters this comfortable concept at once: there is simply too much information, it is structured too complexly, it introduces a temporal dimension, and it is exceedingly redundant (for example, adjacent video frames are virtually identical). From an engineering standpoint, storing multimodal data within a model’s parameters is impossible – a separate external memory becomes a necessity. And the primary neural network evolves from a "container of encoded knowledge" into a "generator of access codes" for this memory.
Moreover, the analogy with QMB functionality is further strengthened because these "codes" become far richer than mere text fragments. Much like the human brain, memory is activated by diverse signals (images, sounds, etc.), by their combinations (a tall man saying "no" brusquely), by their temporal characteristics (a long, piercing blast of a horn heard an hour earlier), and so on. Triggers evolve into "situational states" with their own internal associations, and the retrieved memory fragments describe the activating situation from various perspectives. Unlike textual models such as RAG, what is recalled is not a paragraph of text but a full-fledged "event" in its entirety – resulting from a "resonance" with certain aspects of it. The operation of a multimodal system much more closely resembles the actual activity of the brain in its handling of lived experiences.
In modern practice, multimodality is increasingly understood as more than just "seeing and hearing"; it involves linking perception with action – that is, making decisions in accordance with the surrounding reality, rather than simply describing it. This directly relates to "world models" – system components that enable prediction of what will happen next based on what is occurring now [13, 14]. Based on past experience, the system constructs internal representations of "states of reality" and formulates for itself the "dynamics of change": how various events and factors influence the transition from one state to another. The result is a compact, causal "map of the surrounding world" that includes not only "what is where," but also "what influences what," "what is possible," "what is prohibited," "what is typical," and "what is risky." It allows an AI system to "mentally model" its actions, evaluate their consequences, and test hypotheses before any action is actually taken. Here, ‘reality’ should be understood broadly: it may refer to a robot’s physical settings (rooms, streets, objects), a social environment (people and their reactions), an information space (documents, rules, interfaces), or even the "internal world" of the system itself (goals, plans, constraints, and confidence levels).
World models are inherently multimodal. Accordingly, they require external memory – and, besides, they further diversify its contents. A wide range of new information types must be stored: from descriptions of encountered episodes (specific events, scenes, observations) to the structure and functionality of both the external world and the model itself (stable dependencies, patterns of action and their consequences, typical errors, verification methods…). The primary neural network appears even more clearly as a generator of queries to multi-level knowledge, while external memory becomes part of a cognitive control loop. The processing of external signals inevitably leads to the model’s own internal queries, which may pertain to the surrounding reality as well as to its own capabilities and limitations – including previously attempted "thinking methods" and their results.
[10] Deng, Z. et al (2025). "A Survey of Multimodal Models on Language and Vision: A Unified Modeling Perspective". Data Mining and Machine Learning 2025, 1 (1), 100001.
[11] Jin, Y. et al (2025). "Efficient multimodal large language models: a survey". Vis. Intell. 3: 27.
[12] Junlin, X. et al (2025). "Large multimodal agents: a survey". Vis. Intell., 3: 24.
[13] Ding, J. et al (2024). "Understanding World or Predicting Future? A Comprehensive Survey of World Models". arXiv:2411.14499.
[14] Li, X. et al (2025). "A Comprehensive Survey on World Models for Embodied AI". arXiv:2510.16732
When discussing multimodality and world models, it is essential to mention recent advances in robotics. Embodied AI does not merely interpret text or images; it must perform actions while simultaneously analyzing the consequences and maintaining stability in the surrounding chaos. Furthermore, unlike a chatbot, a robot does not receive easily understood text queries that allow for prolonged deliberation. It operates within rigid time constraints and under conditions of brutal multimodality – being bombarded by streams of heterogeneous, incomplete, and often contradictory sensory signals. Its "memory" cannot be a simple archive of data – it stores structured fragments of situations, identified through a multitude of factors. All of this makes robotics a uniquely demanding stress test for the ideas and concepts described in the previous sections.
The main current trend in the development of all types of robots is the adaptability of the "robot-brain." The goal is to ensure that a robot, using a base AI model as its intellectual foundation, can acquire new skills and knowledge about the world without constantly recalculating its parameters. This aligns directly with the core concept of the architectural solutions discussed above: the primary neural network is increasingly becoming a "skillful manager" of information stored externally to it.
A prominent example is the "Skild Brain" project by Skild AI [15]. Its ambitious goal is to create a single multimodal AI model for all robotic types: from industrial manipulators and mobile platforms to quadrupeds and humanoids. The core idea is to define skills and knowledge – recognition of situations and their mapping to correct action patterns – at a level of abstraction that allows for transfer between different hardware "embodiments." This creates a unified "intellectual field" where diverse robots of various types learn from one another's experiences. The foundational AI model does not require comprehensive retraining for every new task; instead, it manipulates a constantly updated and expanding information corpus. This corpus is built using both video training simulators and real-world feedback from participating agents operating in the field.
According to public statements from Skild AI (as of January 2026), the project is in its early deployment stage. A number of robots are already active, collectively contributing to the centralized "brain." The focus has now shifted to scaling and expanding the pool of participants.
Another example that explicitly utilizes the "world model" concept is the "video-to-action" technology from 1X [16]. It is primarily designed for home robots that interact with families in domestic environments. To be both useful and safe, they must possess a refined "behavioral common sense," which requires a vast quantity of guiding examples. Training such robots on real-life situations is an exceedingly long and costly process. As an alternative, 1X proposes training an AI model on video footage to predict plausible continuations of video sequences, and then applying that model to analyze robots’ actions in specific situations. In this context, adaptation is not a "constant rewriting of an individual robot-brain" from scratch, but rather its synchronization with a simulator – a kind of "general intelligence for home robots" – that "knows" what will happen next. Crucially, training occurs predominantly through video observation (which is fast and relatively inexpensive), with only a small fraction of time devoted to fine-tuning for the robot's specific "physical embodiment."
In October 2025, 1X opened pre-orders for the NEO home humanoid, and in January 2026, it publicly unveiled the "1X World Model" as a key update, enabling NEO to acquire new skills almost autonomously. "Almost" – because human participation in the training process still partially persists. 1X maintains that this share will decrease rapidly as the robot "becomes smarter" – accumulating more and more knowledge about the world.
There are other projects (see, for example, [17]) aimed at the adaptability of the robot-brain. From an architectural perspective, they all reflect a picture familiar from previous sections: intelligence is treated not as "one giant model that knows everything in the world," but as a combination of two components – a dynamic and relatively small "thinking" neural network that makes decisions, and a large memory space that stabilizes its operation and constantly feeds it with relevant context. For embodied AI, such a structure is a strict requirement imposed by reality. A robot operates in an environment that is too diverse, dynamic, and vast to rely solely on the parameters of the primary network or to undergo regular retraining from scratch. This is typical of multimodality in general, but in robotics, it manifests in an extreme form.
[15] Skild AI Team (2026). "Announcing Series C – The Skild Brain / Omni-bodied Intelligence."
[16] 1X AI Team (2026). "1X World Model | From Video to Action: A New Way Robots Learn."
[17] Shang (2025). "A Survey of Embodied World Models."
Overall, the structural-functional correlation between the Quantum Model of the Brain and modern AI architecture is evident – and it becomes increasingly pronounced over time. The demands of emerging AI technologies effectively compel a transition from a single large neural network that "stores" all knowledge within its parameters toward a separation between a relatively compact but "intelligent" network and an extensive, stable, multi-level memory. As in the QMB, AI neurons are becoming primarily responsible for generating the trigger-signals that activate knowledge from the external base. In each reasoning cycle of such an AI model, one can observe a logic similar to that of the QMB:
- The user query (plus the current context) and/or the model’s own internal requirements (such as data conflict or a lack of progress) serve as an analog of boundary conditions.
- As a result, the neural network shifts into a specific internal state (a distribution of neuron activations), effectively "breaking its symmetry" – in a functional, rather than physical, sense – from a state in which all configurations were equally probable.
- This state leads to the formation of triggering vectors (analogous to dipole waves) used to access the repository of stable knowledge (analogous to the aggregate of quantum condensates in the brain).
- The triggers "resonate" – according to some similarity metric – with the content of that repository, thereby facilitating retrieval of the necessary knowledge fragments (analogous to the activation of quantum condensates). This allows the neural network to reconfigure its cognitive dynamics.
- In more advanced agentic AI systems, the internal "need signal" can trigger not only access to its own memory but also external web searches, simulations within a world model, or the execution of auxiliary tools, such as software scripts...
Of course, this similarity serves neither as direct confirmation of the QMB – which remains outside the scientific mainstream – nor as proof that current AI is marching directly toward AGI. Nevertheless, the conceptual proximity of approaches formalizing a vision of both "natural" and "artificial" intelligence helps, even if indirectly, to reduce speculation and enhance the persuasiveness of this perspective. Theoretical physicists and AI developers consider the problem of "intelligence" from entirely different angles, yet their findings indicate much the same thing. This suggests that both our minds and advanced AI are shaped by similar fundamental requirements-questions that dictate similar solutions-answers – and this can lead to the mutual refinement and enrichment of hypotheses/practices.
I should add that in the previous sections, while describing technologies and architectures, I tried to maintain the analogy between memory activation types in AI systems and the external/internal trigger-signals in the QMB. In general, this analogy is quite clear, although, as previously noted, the external/internal separation in AI is becoming increasingly vague. Nonetheless, it can be considered valid. An "external" trigger is the initial query (the user prompt or an environmental signal) plus the current context (the intermediate results of the system's operation triggered by that query), while an "internal" one is a subset of the hidden states of the primary "thinking" network (as opposed to auxiliary retrievers and translators that handle the query and context) describing the system's internal operational process. Clearly, these do not replace one another but are complementary. As we approach AGI, both mechanisms will be required – a developed intelligence must be capable of "recollecting" because of both external prompting and its own internal dynamics. It is not sufficient merely to acquire relevant facts; one must also be able to enter an optimal cognitive mode appropriate to the current situation – particularly when the system encounters uncertainty or recognizes its own typical errors.
The combination of "reminders from the outside" and "internal associations" clearly resonates with the plasticity of the human brain, which enables it not only to adapt to new information but also to reconfigure its thinking style for the most relevant tasks. In a similar vein, an AGI system must constantly relate both to the surrounding world and to its own needs and plans. Obviously, this cannot be achieved by constantly recalculating trillions of weight coefficients: frequent global updates are costly, difficult to control, and may nullify previously acquired capabilities. A QMB-like separation between the neural network itself and an external repository – comprising both new facts about the world and effective thinking patterns – appears far more realistic: the system can evolve through the accumulation of memory fragments, allowing it to both "know" more and "reason" better.
I should also note: when discussing AI "plasticity" (the system's ability to continuously adapt and "become smarter"), the separation into memory and a core neural network is, of course, not the only practical path, but one of several. Modern models already utilize:
- Multi-module routing: The system determines which internal module to use in a specific situation.
- Local update mechanisms: "Fine-tuning" specific fragments of the primary neural network to consolidate a new skill or correct a typical error without disrupting the rest of the architecture.
- Tool-use: The previously mentioned integration of external tools and software.
And so on. AI "plasticity" is not limited solely to the use of an external repository; it requires the ability to properly manage the full set of instruments and functions available to the model. Such "meta-control" naturally necessitates an additional architectural component that leverages the model's "knowledge" of its own functional capabilities and characteristics. Currently, various terms for such meta-modules appear in the literature – orchestrators, controllers, etc.
All of these engineering solutions enhance AI models, making them "smarter," more flexible, and more universal. But to what extent do they bring us closer to AGI? I will attempt to present my perspective in the following sections.