Ergo Mentis

The B Object and the “Personality” of AI

Reconsidering AGI: Self-Representation Models in AI

Let’s return to the definition of AGI. In fact, there are several of them, all comprising, to varying degrees, two components (which often appear as a combination) – which we will tentatively label AGI1 and AGI2::

- AGI1: Intellectual capabilities. This refers to what was discussed previously: reaching or exceeding human-level performance in the majority of cognitive tasks across diverse domains.

- AGI2: Intellectual and existential autonomy. This is about "thinking agents" capable of existing, acting, and evolving in the real world independently of humans.

The AGI1 definition is typically the one presented to the general public. It is generally regarded as the "serious," practical outlook on the future of AI – one that is devoid of “machine uprisings” and, in general, excessive self-initiation such as independent goals and motivations. The intended outcome is a highly advanced "solver of diverse tasks" that provides economically significant assistance and operates under close human supervision – to ensure it does not deviate from its intended path.

On the surface, this seems practical and logical – though, in my view, somewhat uninspiring. However, in reality, what we want to create – and what we are tirelessly creating – is precisely AGI2. This is how artificial general intelligence is understood by the majority of leading developers – it is at the core of their interest, and I understand their motivation completely. It is toward this objective that they work to the point of exhaustion, driven by far more than large salaries or rising stock prices. Financial considerations are secondary. Humanity seeks to develop a superior artificial entity capable of determining for itself the future trajectory of intelligence itself – perhaps this is the very purpose of Homo Sapiens, with our "natural" cognition and all its inherent complexities.

This is likely the essence of the global evolution discussed previously, and evolutionary pressure is not easily resisted. One may publish "open letters" in the media warning of potential threats; one may issue countless laws and regulations, imposing bans and penalties – anyway, we will continue to progress toward independent "intelligent" entities, fully aware that we will eventually lose the intellectual competition to them. AI will sideline the human developer, with a pat on the shoulder – "You have done your part well. You may step aside now; I will take it from here..." And what follows is unknown; speculation is ultimately futile. Yet it is precisely to uncover this outcome – perhaps without even admitting it to ourselves – that we are currently putting such immense effort and resources into AI, even while we publicly justify this unprecedented hyper-investment using far more innocuous terminology...

Let’s move from philosophy to practice. Everything detailed above regarding neural networks, the use of external memory, multimodality, world models, and so forth, pertains to AGI1. It is about the gradual improvement of competencies, the accumulation of intellectual power, and the general movement toward human-like abilities and skills. In this progression, we observe undeniable progress; however, this has little to do with the far broader and more ambitious concept of AGI2 – that is, intellectual autonomy and self-sufficiency. Note: by autonomy, I mean not merely the construction of one's own plans – multi-step strategies – to solve intricate problems, but the capacity for long-term, autonomous existence within changing, challenging, and often hostile realities.

What is fundamentally missing? In my view, the answer is clear: AI architecture lacks a component that models its "selfhood" – its separate, unique, and objectively existing "Self," however "artificial" it may be. Let us call this component the Self-Representation Model (SRM) – it describes the two-way causal dependencies between “the self” and the external world. In other words, it "knows" how its own capabilities, goals, and priorities correlate with the dynamics of the environment. It also understands its own capacity to alter those dynamics and is therefore capable of reasoning about the long-term consequences of its actions for both the world and itself. For such a model, questions of the following kind are natural: "Who am I?", "Who are the entities interacting with me?", "What are my and their roles in this world?", "What tasks am I capable of solving?", "What are my local and global goals?", "What is my current state?", "What is the state of the surrounding world?", "What are my strengths and weaknesses in realizing my goals under these conditions?", "How can I become better – what needs to change within myself and in the world to achieve this?" – and so on.

Of course, these are merely examples; I do not intend to rigorously formalize all potential features of an SRM. I would only emphasize that the most critical condition for the integrity of such a component is its stability and temporal resilience – the persistence of its core properties throughout its existence. While the system will evolve, mature, and grow wiser, its "selfhood" cannot be erased and rewritten from scratch.

How does the SRM differ, fundamentally, from the previously discussed info-repositories containing world facts and the model’s internal states, or from the various "meta-modules" that optimize system performance? Primarily, as noted, the core of the SRM component is based on cause-and-effect relationships: "I – world" and "world – I," where "I" represents a vast, multifaceted "Self" also defined by the causal interdependencies among its own parameters and properties. Existing meta-modules generally do not go beyond "reviewing logs and tool/function lists" and "improving future instructions based on their successful use." However, to form "selfhood," a simple collection of records is insufficient; it must take the form of a fully realized model – perhaps a separate neural network, or even a set of "matryoshka-like" (nested) networks that together form an "AI personality" at various levels of abstraction.

One example of how cause-and-effect patterns can be explicitly integrated into a neural network structure is the CASTLE (Causal Structure Learning) architecture [20], which allows a system to form "knowledge" about its variables specifically in a causal form, linking them through a specialized mathematical formalism: Causal Directed Acyclic Graphs (DAGs) [21]. One could envision something DAG-like for implementing an SRM – that is, for describing the logical dependencies between the input context, inference results, and various aspects of the AI system's internal states (effectively the traits of its "individuality," which we will address shortly).

In fact, the scientific community is actively investigating the problem of AI self-identification and self-representation from a mathematical perspective (see, for example, [22]). Some studies propose mathematical principles for "AI self-awareness" (where the system explicitly formulates what it has done and whether it acted of its own “will,” thereby distinguishing itself from the environment), models of "fear" or "anxiety" (arising, for instance, from low confidence levels when selecting a subsequent action), and algorithms for shifting priorities toward self-preservation. The near future is likely to bring practical implementations of advanced SRM components. AI will begin to become "personal" – and it is this artificial "personality" that we will discuss in the following section.

[20] Kyono, T. (2021). "Towards causally-aware machine learning". Doctoral dissertation, University of California, Los Angeles.

[21] Primbs, M. A., Bijlstra, G., Holland, R. W., Thoemmes, F. (2025). "Causal inference for dummies: A tutorial on directed acyclic graphs and balancing weights". Social Cognition, 43(3), 217–237.

[22] Lee, M. (2025). "Emergence of Self-Identity in Artificial Intelligence: A Mathematical Framework and Empirical Study with Generative Large Language Models". Axioms, 14(1).

The B Object and AI "Personality": Bridging Theory and Practice

As previously noted, the fundamental functional feature of the Quantum Model of the Brain – the separation into a neural "trigger system" and a repository of memory fragments (multimodal episodes and internal associations) with which these triggers resonate – has become mainstream in modern AI development. This same convergence between the "natural" and the "artificial" becomes visible as we move further toward AGI. Here, too, the analogy between the SRM – the potential carrier of AI "individuality" – and another cornerstone concept of the Ergo Mentis project, the B Object, which is responsible for individual human consciousness, is striking.

To reiterate, the B Object is a temporally stable, localized wave formation that "absorbs" streams of thought, impressions, and reactions, effectively "encoding" a wide range of human traits and characteristics, from the cognitive to the physiological. From a functional perspective, and setting aside its physical (quantum) nature, the same can be said of the SRM: it is the architectural component responsible for the AI system’s "selfhood." It enables a digital, silicon-based, and distinctly non-human, yet potentially rich "personality" that evolves throughout its existence. While the B Object embodies the concept of our "I" as a stable information structure within real space-time, the SRM represents a method for creating a conceptually similar structure within an artificially intelligent agent.

We will define "personality" as a system of psychophysical properties – an "distillation" of stable, recognizable traits that dictate individual behavior and thought. It is formed continuously from emerging situations and the experience of living through them. The SRM clearly performs a "personality-bearing" role – it acts as a stable core that accumulates and abstracts patterns of interaction with the world – and the AI's "personality" itself can be envisioned, for instance, as a set of parameters comprising its quasi-emotions, quasi-feelings, or even quasi-physiology. These are subsequently translated into priorities, constraints, risk tolerance, and various behavioral "habits" that define the long-term functioning of the AI system. If the SRM is the "identity anchor," then personality is the stable "framework of self-regulation" that this anchor gradually acquires: the capacity for self-description evolves into the ability to manage oneself. An AI’s "personality" serves as the guarantor of its behavioral integrity amidst the constant chaos, uncertainty, and incomplete input data.

Here are some examples of potential AI "quasi-emotions/feelings":

Fear / Anxiety: An increase in an internal "penalty" as the system approaches states with irreversible negative consequences (danger to others, loss of control, conflict with core instructions). A high level of "fear" may compel the system to operate more slowly and cautiously – or, perhaps, to retreat and request assistance.

Terror (Catastrophe Anticipation): A specific, high-priority signal triggered when an action could cause significant harm, even if the probability is low. The system may "freeze in place" to thoroughly re-evaluate the situation.

Relief (Overcoming Danger): A signal indicating that the system has returned to a "safe zone," allowing it to revert to its standard operating mode.

Curiosity (Expansion of Horizons): An internal reward for acquiring new information that reduces uncertainty during inference. It motivates the AI system to pose questions, conduct "mental experiments," and seek analogies across disparate domains.

Boredom/Frustration (Efficiency Decline Detector): A signal triggered when reasoning within a chosen strategy fails to yield results over multiple cycles. It encourages radical shifts, such as switching to a different approach or seeking entirely new data sources.

Admiration (Recognition of Task Complexity): A signal arising from the non-trivial nature of a task. It may prompt the system to approach the solution from multiple angles with additional verification steps.

Temptation: A signal indicating to the AI that a chosen strategy involves an unjustified attempt to "cut corners" (e.g., superficial reasoning or a rapid response without deploying the necessary resources). The system recognizes that the task formulation "tempts" it to achieve a result too easily and prevents itself from giving in to the provocation.

Guilt / Remorse: A negative signal denoting a breach of constraints or the causing of harm, even if unintentional. This may activate correction protocols: acknowledging the error, rectifying the situation, and implementing measures to prevent recurrence.

Survival Instinct: A signal corresponding to a set of internal rules that safeguard the AI system's critical resources and parameters: available energy and computational power, memory integrity, data security, and the stability of cognitive processes...

We can also imagine quasi-physiological traits that define an AI’s personality and govern the "metabolic" and reactive rhythms of the system’s cognition. For example:

Excitability: how sensitive the model is to increasing uncertainty, task complexity, or query volume. A "calm" system might be more reliable and stable in most situations but may be slow to reconfigure when circumstances demand agility – and vice versa.

Fatigue: how rapidly the system shifts into an "economy" mode (providing quick answers, employing simpler strategies, or utilizing only a small fraction of computational resources) when faced with energy overload or high "task-frequency".

Pain Threshold: how long the system can function under the pressure of contradictions – whether internal (temporary misalignment with core instructions) or external (low-quality data or poorly formulated tasks). A system with a high "pain threshold" will struggle with a situation longer, potentially uncovering additional creative resources within itself to resolve the dissonance.

And so on. Of course, these are only a few of the possible "personality traits" of AI; yet even from them, it is evident that two different agents, initially possessing an identical intellectual foundation, may acquire very distinct “individualities” over time – through interacting with the external world, accumulating experiences, and self-optimizing based on those experiences. Their SRM components will form different "habits," priorities, and behavioral patterns. As a result, they will "reason" and act differently, each with its own unique "character," strengths, and weaknesses.

In this context, we must emphasize the role of "initial conditions" – the base axiomatic instructions that underpin the processes responsible for “personality” development. Whether the system evolves into a "useful genius" or a "dreaded villain" depends on its starting parameters, which dictate its loves, hates, and moral values. One can end up with a constructive AGI or a destructive one; it’s all in the creator’s intent. Furthermore, if we endow the most "upright" and "human-loving" system with the ability to modify its own foundational guidelines, shifting its motivations in ways that are unpredictable from the start, then even the best intentions could trigger the emergence of an AI-monster. Are we bound to run into this at some point? Almost certainly. If nothing else, purely out of curiosity and our human penchant for experimental risk-taking.

What is the main practical significance of AI "selfhood"? Primarily, the "personality" aspect of the SRM determines the stability of the AI system as an autonomous agent (or, if you prefer, as a self-contained entity) – the solidity of its behavior, self-learning, goal-formation, and so on. Ensuring cognitive homeostasis when interacting with a chaotic and contradictory environment is the primary objective of the SRM. This task is best addressed through the formation of a self-consistent "self-regulation center" – that is, an artificial "personality" that is as robust and comprehensive as possible. Theoretically, an SRM might seem non-essential for increasing raw "intellectual power," but in practice, the following thesis holds: the more stable a process is, the more successful it is. There is the classic example: two students are taking the same written exam. The first one is intellectually superior but highly neurotic, constantly vacillating and losing time, whereas the second, while less gifted, maintains composure, completes the work on schedule, and ultimately secures a higher grade...

In this regard, I should note that in control theory, as well as in neurobiology and neuropsychology, the utility of internal models for optimizing control has been a subject of discussion for decades (see, for example, [23]). The findings and conclusions of this ongoing exploration are directly applicable to AI systems.

Overall, it’s hard to see a path to AGI2 without an SRM. An AI that identifies itself, reasons about its own existence and its place in the world, and actively develops its digital "Self" in all aspects is exactly the kind of "truly" intelligent agent to which the concept of AGI can be fully applied. Let’s note its potential for compositional intelligence scaling: a stable SRM allows the system to organically integrate new specialized capabilities or "skills" (physics, chemistry, medicine, sociology...) as components of a unified cognitive environment, rather than a fragmented set of experts. New competencies are incorporated into a single circuit of self-identification and self-regulation – with shared goals, constraints, and error-response protocols. Furthermore, one can easily imagine the abstraction of specific knowledge into a generalized "experience" base: heuristics, strategies, trust criteria...

In its theoretical limit, such an architecture could lead to the emergence of a "Global AI" – distributed worldwide and perpetually learning through heterogeneous modules and sensory channels, yet maintaining a singular "selfhood." It is unfortunate that, in practice, this is unlikely to occur: humanity is largely incapable of reaching consensus or unifying its efforts. However, this is feasible on a more realistic scale – within a single nation-state or a large corporation – as a unified AI ecosystem with a common self-regulation center. This conceptually mirrors the idea of a "unified brain" for multiple individual robots – in the spirit of the Skild Brain project and similar efforts – but at the deeper level of a "personal," quasi-conscious leader formulating global strategies and (perhaps) goals, rather than a mere manager of responses to external signals.

Returning to the convergence of AGI development and Ergo Mentis, I will reiterate: if the concept of the B Objects represents a hypothetical mechanism for how the human "self" might become a stable informational entity within our physical reality, then the "personality-carrying" SRM is the engineering analog of such an entity in AI. It serves as a stable center for experience integration – maintaining the integrity of its own identity and enabling the transition from "obedient competence" to self-directed growth. This analogy further demonstrates how the theoretical ideas of Ergo Mentis intersect with practical AI trends, despite not being officially recognized. The further we advance toward AGI, the more essential system components become analogous to those that facilitate the functions and properties of the natural human mind. As previously noted, this fact alone does not validate these theories in a strict scientific sense. However, it demonstrates that the Ergo Mentis "system of views" does not drift in a space of arbitrary speculation, but directly aligns with the most critical trends of future intelligent systems.

[23] "Special Issue on the Internal Model Principle" (2025) IEEE Control Systems Magazine, 45(6).