Friday, June 12, 2026

Building Proto-Affective Agents with Active Inference

For a list of all posts go here.

Building Proto-Affective Agents with Active Inference

Most artificial agents today are built around a simple idea: maximize reward.

Whether in reinforcement learning, game-playing systems, or autonomous robotics, the agent is typically given a predefined objective and learns policies that optimize it. While this approach has produced impressive results, it often leaves open a deeper question:

How can behavior emerge from the need to maintain oneself in the world, rather than from externally defined rewards?

This question led me to explore Active Inference — a framework originating in theoretical neuroscience that models biological organisms not as reward maximizers, but as systems that continuously minimize prediction errors while maintaining preferred states.

Instead of explicitly coding emotions, drives, or goals, I wanted to investigate whether simple emotional and motivational dynamics could emerge naturally from:

  • self-maintenance,

  • uncertainty,

  • prediction errors,

  • and preferred states.

The result so far is a series of minimal simulations that progressively evolve from a simple persistent self-model toward exploratory, proto-affective behavior.

This post describes the first two scenarios.


Scenario 1 — The Minimal Self

The first experiment began with the simplest possible Active Inference organism.

The agent exists in a two-dimensional grid world and possesses only:

  • beliefs about its own position,

  • noisy observations generated from those beliefs,

  • and a weak prior that its inferred state should persist.

At this stage:

  • there are no targets,

  • no threats,

  • no rewards,

  • and no external objectives.

The entire system revolves around minimizing prediction errors between:

  • inferred hidden states,

  • and noisy sensory consequences.

The core loop is extremely simple:

  1. Generate noisy observations from current beliefs.

  2. Compute prediction errors.

  3. Compute Variational Free Energy (VFE).

  4. Update beliefs to reduce prediction errors.

  5. Repeat.



The interesting part emerged when experimenting with priors.

Without a prior on self-state persistence, the system naturally collapses toward zero. Since the observations are attenuated versions of beliefs, minimizing prediction errors alone gradually dampens all internal dynamics.

To prevent this collapse, I introduced a weak self prior:

  • a preferred inferred state toward which the system is softly attracted.

This simple modification creates a minimal form of self-maintenance.

Importantly, the “self” here is not a symbolic entity or explicit representation. It is simply:

a persistent inferred state stabilized by priors.

This distinction turned out to be surprisingly profound.

The system only begins to exhibit coherent persistence once there is a meaningful separation between:

  • the current inferred state,

  • and the preferred inferred state.

Without that distinction, there is no homeostasis, no regulation, and no meaningful self-maintenance.






Figure 1 — Scenario 1: trajectory and Variational Free Energy over time. The weak self prior prevents collapse of inferred self-state, allowing persistent self-maintenance dynamics to emerge.


Scenario 2 — Hunger, Uncertainty, and Proto-Exploration

Once the minimal self became stable, the next step was introducing motivation.

Rather than hardcoding rewards or explicit goals, I introduced the idea of “hunger” as:

increased probability of preferred states associated with nutrition.

At this stage, “nutrition” remained abstract. It could represent:

  • food,

  • water,

  • information,

  • or knowledge.

The important point was that the agent now possessed uncertain beliefs about meaningful external states.

Initially, the target’s position was completely unknown.

Instead of assigning a fixed target location, target beliefs were initialized as noise, representing uncertainty about where meaningful states might exist.

The agent now maintained:

  • beliefs about itself,

  • beliefs about targets,

  • separate prediction errors,

  • and separate Variational Free Energies.

The total agent free energy became:


An important design choice was avoiding fixed target locations. Instead, the system sampled distributed target priors across the world, encoding:

“meaningful states may exist somewhere.”

This produced surprisingly rich dynamics.

The agent began exhibiting:

  • wandering,

  • oscillatory movement,

  • switching motivational tendencies,

  • and persistent exploration-like behavior.

At this point, I deliberately avoided labeling emotions explicitly.

One particularly interesting observation was that unresolved prediction errors could plausibly correspond to multiple emotional interpretations simultaneously.

For example:

  • unresolved uncertainty may resemble curiosity,

  • but also anxiety,

  • vigilance,

  • anticipation,

  • or exploratory tension.

This suggested a more interesting possibility:

emotions may emerge as modes of uncertainty regulation, rather than predefined symbolic states.

The same underlying dynamical signal can produce very different affective behaviors depending on:

  • precision weighting,

  • temporal persistence,

  • controllability,

  • and stability of the self-model.

To track these dynamics, I introduced several internal indicators:

  • unresolved target prediction errors,

  • motivational drive strength,

  • and surprise.

Surprise was defined as temporal change in free energy:


This produced clear spikes whenever internal beliefs became suddenly invalidated.

Although the system still lacks true environmental sensing, it already exhibits a form of internally generated exploratory pressure:

  • unresolved uncertainty itself destabilizes the system,

  • driving persistent wandering even in the absence of visible targets.

This is particularly interesting because biological organisms rarely remain perfectly still when no explicit target or threat is visible. Exploration itself appears to be an intrinsic regulatory process.





Figure 2 — Scenario 2: exploratory trajectories generated by uncertain target beliefs and distributed motivational priors. Persistent wandering emerges without explicit reward maximization or path-planning routines.


Toward Epistemic Behavior

The next scenario will introduce a major transition:

  • actual hidden targets,

  • visibility radius,

  • and position-dependent observations.

At that point:

  • movement will change information,

  • uncertainty reduction will become spatially meaningful,

  • and exploration will become genuinely epistemic.

This is where curiosity may begin to emerge not merely as unresolved internal tension, but as active information-seeking behavior.

What makes this direction especially fascinating is that none of these behaviors are explicitly programmed as emotions.

Instead, they emerge progressively from:

  • prediction error minimization,

  • preferred states,

  • uncertainty regulation,

  • and self-maintaining inference dynamics.

The long-term goal is not to build an “emotional AI” in the conventional sense, but to explore whether affective organization itself can emerge from the mathematics of self-maintaining inference.

I suspect we are only at the very beginning of that exploration.

Sunday, May 10, 2026

From Simulation to Embodiment: Toward Artificial Motivation, Curiosity, and Fear.

For a list of all posts go here.

Introduction


In the previous posts, we explored how core aspects of subjective experience — motivation, addiction-like behavior, and fear — can emerge from simple computational principles. By modeling dopaminergic and noradrenergic dynamics within an active inference framework, we saw how agents can be driven to pursue goals, become sensitized to cues, or withdraw from perceived threats.

These models were deliberately minimal.

They operated in abstract environments, with simple state spaces and clearly defined targets. Their purpose was not realism, but clarity: to isolate the mechanisms by which internal drives shape behavior.

But real agents do not live in grid worlds.

They perceive complex, high-dimensional environments. They process language, images, and sound. They act continuously, not in discrete steps. And their internal states are not directly observable, but must be inferred from rich sensory streams.

This raises a natural question:

👉What happens when these computational principles are brought into contact with the real world?

Friday, April 24, 2026

Fear as the Counterpart of Motivation: A Computational Exploration

 For a list of all posts go here.

Introduction

In previous posts, we explored how motivation emerges from dopaminergic dynamics. Tonic dopamine modulated behavioral vigor, determining whether an agent explored, persisted, or disengaged. Phasic dopamine, in turn, enabled cue learning and sensitization, allowing neutral stimuli to acquire motivational power and, under certain conditions, produce addiction-like behavior.

In all those cases, behavior was organized around approach.

But adaptive agents must do more than pursue what is rewarding. They must also avoid what is harmful. They must detect danger, respond to uncertainty, and sometimes refrain from acting altogether.

This post explores fear as the computational counterpart of motivation: a process not of energizing action, but of constraining it. While motivation pulls the agent toward goals, fear shapes the space of actions that are considered safe.


Conceptual framing

If dopamine answers the question:

“Is it worth acting?”

then noradrenaline answers:

“Is it safe to act?”

In this framework, fear is not treated as an emotion in the psychological sense, but as a computational modulation of precision over threat and avoidance.

We introduce two variables:

  • Phasic noradrenaline (NAPhasic): a transient signal triggered by unexpected threat or volatility.

  • Tonic noradrenaline (NATonic): a slower, accumulating state representing sustained arousal and threat expectation.

Together, they determine how strongly the agent weights interoceptive discomfort and how readily it shifts away from current strategies.


From reward prediction to threat prediction

In the motivational model, phasic dopamine was driven by reward prediction error — the difference between expected and received reward.

Here, phasic noradrenaline is driven by unexpected threat.

This can be formalized in two equivalent ways:

  • As unexpected increases in interoceptive discomfort, or

  • As volatility in prediction error, indicating that the environment is less predictable than expected.

In both cases, the key idea is the same:

Noradrenaline signals that the world is not behaving as expected, and that current assumptions may be unsafe.


Tonic arousal and the amplification of discomfort

Phasic noradrenaline does not act alone. Repeated exposure to unexpected threat leads to a sustained increase in tonic noradrenaline.

This has a crucial consequence:

  • The same external situation produces greater internal discomfort.

In computational terms, tonic noradrenaline scales the gain of interoceptive signals. The environment does not need to become more dangerous; it only needs to be perceived as such.

This creates a shift from:

  • objective threat
    to

  • subjective threat sensitivity.


Fear as avoidance precision

In the motivational model, dopamine increased the precision of policies leading to reward.

In the fear model, noradrenaline increases the precision of policies that avoid harm.

Action selection becomes a balance between:

  • Expected reward (approach), and

  • Expected discomfort (avoidance), amplified by tonic arousal.

When noradrenaline is low:

  • The agent tolerates risk.

  • Exploration and goal pursuit dominate.

When noradrenaline is high:

  • Avoidance dominates.

  • The agent becomes cautious, then inhibited, and eventually unable to act.


Threat learning and generalization

A key feature of fear is that it extends beyond the original source of harm.

When unexpected discomfort occurs in a given state, the representation of that state — and nearby states — becomes associated with threat.

Over time, this produces:

  • Generalization: safe contexts are treated as dangerous.

  • Persistence: threat remains even after the original cause is gone.

This is the avoidance counterpart of cue sensitization in addiction.

If phasic dopamine turns neutral cues into objects of desire,
phasic noradrenaline turns neutral contexts into objects of avoidance.


Emergent behavioral regimes

As in the motivational models, different regimes emerge from the interaction between tonic and phasic dynamics.

(a) Low tonic noradrenaline

  • Low sensitivity to threat

  • Risk-taking behavior

  • Weak avoidance learning

Interpretation: under-reactivity or emotional blunting.


(b) Moderate tonic noradrenaline

  • Adaptive fear responses

  • Flexible avoidance

  • Balanced exploration and caution

Interpretation: healthy behavior.


(c) Moderate tonic noradrenaline with strong sensitization

  • Specific avoidance patterns

  • Persistent fear of particular contexts

Interpretation: phobia-like behavior.


(d) High tonic noradrenaline with strong sensitization

  • Generalized avoidance

  • Failure to approach goals

  • Persistent discomfort even in safe conditions

Interpretation: trauma-like regime.


Comparison with motivation

The symmetry with the dopaminergic system is striking:

Motivation (Dopamine)Fear (Noradrenaline)
Approach behaviorAvoidance behavior
Reward prediction errorThreat / volatility signal
Cue sensitizationThreat generalization
Addiction (wanting without liking)Trauma (fear without danger)
Increased policy precision (approach)Increased policy precision (avoidance)

Motivation expands the space of action.
Fear constrains it.

Both are necessary. Both can become pathological.


What this model does not yet explain

This minimal formulation does not yet include:

  • Interactions with dopaminergic motivation

  • Social or contextual modulation of fear

  • Long-term recovery or extinction mechanisms

  • Multimodal perception and real-world complexity

These limitations are deliberate. As in previous posts, the goal is to isolate the core computational principles before integrating them into richer systems.


Looking ahead

While the present model treats fear as a computational counterpart to motivation — emerging from precision over threat and avoidance — it remains deliberately minimal. It does not yet engage with rich sensory input, language, or real-world interaction. The next phase of this project will extend these mechanisms beyond abstract environments, integrating artificial motivation, curiosity, and fear into agents grounded in vision and voice, powered by systems such as Gemini and OpenAI models. The goal is to explore how these fundamental drives operate when perception becomes high-dimensional, when interaction becomes continuous, and when internal states must be inferred from complex sensory streams. In doing so, the project moves from isolated simulations toward embodied, multimodal agents — where the dynamics of approach and avoidance are no longer theoretical constructs, but active forces shaping behavior in real time.


Conclusion

Fear is not simply the absence of motivation. It is an active process that shapes behavior by increasing sensitivity to threat and constraining the space of possible actions.

By introducing phasic and tonic noradrenaline into the computational framework, we see how adaptive mechanisms for detecting uncertainty and avoiding harm can give rise to persistent avoidance, generalization, and trauma-like dynamics.

If dopamine determines what we pursue,
noradrenaline determines what we avoid.

Together, they define the boundaries of behavior.

Wednesday, March 4, 2026

Phasic Dopamine, Cue Sensitization, and the Emergence of Addiction: A Computational Exploration

For a list of all posts go here.

Introduction

In the previous post, we examined how tonic dopamine shapes motivation, curiosity, and behavioral vigor. We saw that different motivational regimes — apathy, healthy goal pursuit, compulsion, and oscillatory instability — emerge from the interaction between tonic dopamine and hedonic adaptation.

However, none of those regimes produced addiction.

The agent never pursued the target in the absence of pleasure.
It never developed persistent attraction to neutral stimuli.
It never exhibited craving.

This limitation was intentional.

To understand addiction-like behavior, we must introduce a second dopaminergic dynamic: phasic dopamine.

While tonic dopamine energizes behavior globally, phasic dopamine encodes rapid, transient bursts linked to reward prediction errors. In this post, we extend the previous model by adding phasic dopamine and cue sensitization — and examine how addiction-like dynamics emerge.


Conceptual framing

The extended model preserves all assumptions from the tonic dopamine simulation, including:

  • An interoceptive hedonic target.

  • Hedonic adaptation via exponential decay of target gain.

  • Tonic dopamine (DATonic) scaling global motivational vigor.

The critical additions are:

  1. A cue colocated with the target.

  2. A phasic dopamine burst (DAPhasic) triggered upon unexpected reward.

  3. A sensitization mechanism, whereby repeated phasic bursts increase the gain of the cue representation.

In this framework:

  • DATonic scales overall motivation.

  • DAPhasic updates the salience of specific states.

  • Addiction emerges when cue salience persists despite declining hedonic value.


The extended simulated environment

The environment is identical to the previous simulation, with one addition:

  • A neutral cue is colocated with the hedonic target.

Initially:

  • The cue has no or little initial intrinsic value.

  • It merely marks the spatial location of reward.

Over time:

  • Phasic dopamine bursts strengthen the cue’s salience.

This creates a divergence between:

  • Hedonic value (which decays),

  • Cue salience (which can increase).


Phasic dopamine as reward prediction error

Phasic dopamine bursts are modeled as transient increases triggered when:



This mechanism produces incentive sensitization.

Crucially, hedonic adaptation continues independently.


Dissociation between “liking” and “wanting”

As the agent repeatedly reaches the target:

  • Hedonic value decays exponentially.

  • Cue salience increases incrementally.

Eventually:

  • Pleasure decreases.

  • Cue-driven motivation increases.

This produces a state where:

The agent strongly “wants” what it increasingly fails to “like.”

This is the computational signature of addiction.


Emergent behavioral regimes

As in the tonic dopamine model, distinct regimes emerge from parameter interactions.

Here, the key dimensions are:

  1. DATonic (global motivation)

  2. Cue sensitization rate (α)

  3. Hedonic adaptation rate


(a) Low sensitization

If cue gain increases, but remains low:

  • Behavior resembles the tonic dopamine regimes.

  • No addiction occurs.

  • Hedonic decay eventually extinguishes pursuit.

This replicates the previous post’s findings.


(b) Strong sensitization

Observed behavior:

  • Agent continues to approach the cue even after hedonic value declines to negligible values.

  • Seeking becomes decoupled from pleasure.

  • Reduced behavioral flexibility.

Interpretation:

  • Full addiction-like behavior.

  • Cue-driven motivation dominates.

  • Cue salience overrides hedonic feedback.









Figure 1. Divergence of hedonic value and cue gain.

Caption:
The hedonic target (blue) is colocated with a neutral cue. Phasic dopamine bursts occur when the agent unexpectedly receives reward, increasing cue salience. Hedonic value decays with repeated consumption, while cue gain increases through phasic dopamine bursts, producing dissociation between liking and wanting. However, since cue salience remains low, no addiction is observed.









Figure 2. Addiction-like regime.

Caption:
Persistent approach behavior driven by cue salience, despite declining hedonic value.


Comparison with tonic dopamine regimes

The contrast with the previous post is illuminating:

Tonic-only modelPhasic-extended model
Motivation tied to hedonic valueMotivation can decouple from hedonic value
No persistent seeking without pleasurePersistent seeking despite low pleasure
Compulsion without addictionCue-driven addiction-like behavior


Tonic dopamine energizes behavior.

Phasic dopamine assigns salience.

Addiction requires both.


Why tonic dopamine alone cannot produce addiction

In the tonic-only model:

  • Motivation collapses when hedonic value decays.

  • No cue becomes intrinsically attractive.

  • Behavior remains tied to interoceptive reward.

Therefore:

Tonic dopamine explains vigor, not craving.

Only when phasic dopamine strengthens cue representations does persistent seeking emerge.


Addiction as maladaptive precision

In active inference terms, addiction can be interpreted as:

  • Excessively precise priors over cue-related policies.

  • Overweighting of exteroceptive salience.

  • Reduced influence of interoceptive negative feedback.

This reframes addiction as a disorder of precision weighting rather than simple reward excess.


What this model does not explain

This minimal model does not include:

  • Withdrawal states,

  • Stress modulation,

  • Trauma interactions,

  • Noradrenergic arousal systems.

These elements are critical in real addiction and will require further extensions.


Looking ahead

In the next step of this research, we will explore how threat, arousal, and avoidance systems interact with dopaminergic motivation.

If phasic dopamine explains why we pursue cues despite declining pleasure, understanding trauma will require modeling how certain cues become associated not with reward, but with threat — and how avoidance becomes compulsive.

Addiction and trauma may ultimately emerge as dual pathologies of precision in opposite motivational directions.


Conclusion

By extending the tonic dopamine model to include phasic bursts and cue sensitization, we observe the emergence of addiction-like dynamics.

The critical mechanism is not increased pleasure, but the divergence between:

  • hedonic value (which declines),

  • and cue salience (which increases).

This dissociation between liking and wanting transforms motivated behavior into persistent, inflexible seeking.

Tonic dopamine energizes behavior.
Phasic dopamine reshapes what behavior is directed toward.

Together, they form the computational backbone of addiction.



Monday, December 29, 2025

Tonic Dopamine as the Engine of Motivation: A Computational Exploration

 For a list of all posts go here.


Introduction

Motivation is often conflated with reward, pleasure, or learning. Yet everyday experience — and clinical observation — shows that these concepts can dissociate. One can desperately want without enjoying, explore without committing, or remain apathetic despite available rewards.
In neuroscience, these distinctions are often discussed in terms of dopamine, but dopamine itself is not a unitary signal. In particular, tonic dopamine (DATonic) has been proposed to regulate the energization and vigor of behavior, rather than learning or reward prediction per se.
In this post, I present a minimal computational model designed to isolate the role of tonic dopamine in motivation, curiosity, and goal-directed behavior. Importantly, this model does not include cues or phasic dopamine, and therefore does not model addiction. Instead, it focuses on how different motivational regimes emerge from the interaction between tonic dopamine and hedonic adaptation.
The next post will build on this foundation by introducing phasic dopamine, cue sensitization, and addiction-like dynamics.

Conceptual framing

The guiding assumptions of the model are:

• DATonic modulates motivation globally, scaling the probability that actions are selected and executed.
• DATonic also modulates curiosity, determining whether the agent explores or remains inert.
• Hedonic value is interoceptive, tied to a target that produces pleasure when reached.
• Hedonic adaptation causes the subjective value of the target to decay with repeated consumption.
• No cue is present at the target location, ensuring that motivation remains hedonic rather than cue-driven.
Within an active inference perspective, DATonic can be interpreted as a form of policy precision or gain: higher values increase behavioral confidence and vigor, while lower values produce indecision and apathy.

The simulated environment

The agent operates in a simple two-dimensional environment, introduced in a previous post:

• A target produces interoceptive hedonic feedback when reached.
• A barrier blocks direct access to the target, with a narrow opening.
• The agent must explore to discover the opening before it can repeatedly reach the target.

Crucially:

• The target does not move.
• There is no explicit reward prediction error.
• There is no cue colocated with the target.
This ensures that all observed behaviors emerge from motivation and adaptation, not learning or habit formation.

Two key parameters

After running extensive simulations, it became clear that motivational regimes are best characterized not by DATonic alone, but by the interaction of two parameters:

• DATonic level after target detection
This determines how strongly motivation and curiosity are energized once the agent knows the target exists.
• Hedonic adaptation rate
Implemented as an exponential decay of target gain with repeated visits, governed by a multiplier on the number of target encounters.

Two simple equations characterize the motivational regimes:

a) Subjective Motivation is proportional to (distance to target) * (DATonic) * (Exteroceptive Gain of Target) * (Interoceptive Subjective Hedonic Value of target),

b) where (Interoceptive Subjective Hedonic Value of target) = exp(-(hedonic adaptation decay rate) * (number of visits to target)).

Together, these parameters define a two-dimensional motivational phase space.

Emergent motivational regimes

Four distinct behavioral regimes emerge naturally from the simulations.

(a) Low DATonic, any adaptation rate

Observed behavior:
The agent shows minimal exploration, fails to overcome the barrier, and never reaches the target. Hedonic value remains constant because it is never experienced.

Phenomenological interpretation:
This regime closely resembles apathy, as seen in Parkinson’s disease, severe depression, or negative symptoms of schizophrenia.
Importantly, the failure here is not due to lack of reward value, but to insufficient motivational energy to act upon it.




Figure 1. Apathy regime.
Caption: Low DATonic produces minimal exploration and failure to reach the target, regardless of hedonic adaptation rate. Motivation collapses before goal-directed behavior can emerge.

 

(b) Medium DATonic, medium adaptation rate

Observed behavior:
The agent explores, finds the opening, reaches the target a few times, and then disengages as hedonic value decays.

Phenomenological interpretation:
This regime corresponds to healthy goal-directed behavior: curiosity-driven exploration, successful pursuit, and flexible disengagement once interest wanes.
This regime reflects a balance between motivational drive and hedonic adaptation.





Figure 2. Healthy motivation regime.
Caption: With moderate DATonic and moderate hedonic adaptation, the agent reaches the target a few times before disengaging as subjective value diminishes.


(c) Medium DATonic, low adaptation rate

Observed behavior:
The agent repeatedly reaches the target. Hedonic value diminishes slowly, sustaining prolonged engagement.

Phenomenological interpretation:
This regime resembles compulsive behavior, such as that seen in mania or stimulant intoxication. Motivation remains high despite diminishing novelty.
Crucially, this is not addiction. There are no cues, no sensitization, and no persistence in the absence of hedonic value.





Figure 3. Compulsion without addiction.
Caption: Slow hedonic adaptation combined with sufficient DATonic sustains repeated target engagement, producing compulsive-like behavior without cue dependence.


A unifying view: motivation lies in a Goldilocks zone

Taken together, these simulations highlight a key principle:
Motivation is optimal within a narrow range of tonic dopamine.
• Too little → apathy
• Too much → instability or compulsion
• Balanced → flexible, goal-directed behavior
Importantly, hedonic adaptation modulates whether motivation extinguishes or persists, but it cannot generate addiction on its own.

What this model does not explain

This model deliberately excludes:
• cue-driven behavior,
• sensitization,
• craving,
• persistence in the absence of reward.
As a result, it cannot explain addiction.
This limitation is intentional.

Looking ahead: phasic dopamine and cue sensitization

None of the regimes described above explain why neutral cues can acquire overwhelming motivational power, or why seeking persists even when pleasure fades.
To address those phenomena, we must introduce phasic dopamine, reward prediction errors, and cue sensitization.
That is the focus of the next post.

Conclusion

By isolating tonic dopamine and hedonic adaptation, this model demonstrates how diverse motivational phenotypes can emerge without learning, cues, or addiction. It provides a computational bridge between dopamine theory, active inference, and clinical phenomenology — and sets the stage for understanding how phasic dopamine reshapes motivation in far more pathological ways.

Tuesday, November 11, 2025

Curiosity, Motivation and Discomfort

  For a list of all posts go here.

Curiosity Helps Agents Find Their Way: Targets, Barriers, and Active Inference

Imagine an agent moving through a 2D world. In this world, some points are targets — places the agent wants to reach — and some structures are barriers — obstacles that push the agent away. Both can be thought of as invisible “fields”: targets attract, barriers repel. The agent’s task is to navigate this space.

How does the agent decide what to do?

The agent doesn’t see the full map in advance. Instead, it relies on Active Inference, a framework from neuroscience and AI. In Active Inference, the agent constantly predicts what should happen if it acts a certain way, compares this to what actually happens, and adjusts its beliefs and actions to minimize the difference (prediction errors).

This process is described mathematically by the minimization of Free Energy:

  • Variational Free Energy (VFE): measures surprise about the present.

  • Expected Free Energy (EFE): looks ahead to evaluate future actions.

EFE has two parts:

  • A pragmatic term: “Will this action get me closer to my goal?”

  • An epistemic term (curiosity): “Will this action help me learn something new about the environment?”


The Barrier with an Opening

Let’s take a concrete example.

  • The agent starts on one side of the map.

  • A fixed target is placed on the other side.

  • Between them lies a wall-like barrier, with a small opening at the top.

The agent has to find that opening to reach the target.

  • Without curiosity: The agent focuses only on the pragmatic term of EFE (minimizing present prediction errors). It feels the attraction of the target, but the barrier blocks the way. Usually, the agent just presses against the wall and fails to reach the goal.

  • With curiosity: The agent balances the pragmatic and epistemic terms. It tries different strategies, explores new directions, and eventually discovers the opening. Once it passes through, the pragmatic drive takes over, and the agent reaches the target.


                                Figure 1: Without curiosity.


                               Figure 2: With curiosity.





                                

Motivation and Discomfort

To make things more interesting, we modeled two extra subjective-like signals:

  • Motivation: linked to the distance to the target. The further away, the stronger the pull.

  • Discomfort: linked to the distance to the barrier. The closer the agent gets to the wall, the stronger the push.

Both of these influence prediction errors, meaning the agent doesn’t just navigate mechanically — it feels motivated to move forward and uncomfortable when too close to obstacles. These signals are showed in the figures above.


A Twist: Sensitization to Motivation

We also added a mechanism we call Sensitization to Motivation. This is like a gain control that determines how strongly the agent reacts to motivational drives. Interestingly, once the target is reached, this gain drops close to zero — meaning the agent stops feeling driven once it achieves its goal.

This looks a lot like how dopamine works in biological brains: high when we anticipate rewards, but decreasing once the goal is achieved (Schultz et al., 1997).


                               Figure 3: Metrics for Fig. 2.





Why Does This Matter?

This simple simulation shows why curiosity matters. A purely pragmatic agent gets stuck. But an agent that values both pragmatic and epistemic terms — in other words, one that allows curiosity to guide exploration — finds the solution.

The framework also lets us connect navigation to subjective states like motivation, discomfort, and curiosity, and even to neurobiology through concepts like dopamine.


What’s Next?

  • Adding more targets and barriers to see how agents handle complex maps.

  • Exploring how different levels of curiosity affect efficiency.

  • Connecting the model more explicitly to neurotransmitter systems, like dopamine (motivation), acetylcholine (uncertainty), and serotonin (risk).


References

  • Friston, K., et al. (2017). Active Inference: A Process Theory. Neural Computation, 29(1), 1–49.

  • Schwartenbeck, P., et al. (2019). Computational mechanisms of curiosity and goal-directed exploration. Nature Neuroscience, 22(3), 437–447.

  • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599.

Wednesday, October 15, 2025

Curiosidade, Motivação e Desconforto

 Etapas simplificadas do processamento da motivação em indivíduos saudáveis:

  1. O cerebelo utiliza glutamato para ativar circuitos que calculam a discrepância entre a posição do alvo e a posição atual do agente.

  2. O alvo pode se referir tanto a objetos físicos quanto a interações sociais.

  3. Esses circuitos enviam sinais para áreas mesencefálicas responsáveis por modular a dopamina, como a área tegmental ventral (VTA).

  4. A dopamina, por sua vez, regula a motivação no córtex pré-frontal e em outras áreas associadas à tomada de decisão e recompensa.

➡️ Fluxo funcional:
Cerebelo (glutamato) → VTA → Dopamina → Córtex pré-frontal / áreas motivacionais


Em indivíduos com hiperatividade glutamatérgica cerebelar:

  1. O excesso de glutamato pode gerar sinais ruidosos ou imprecisos.

  2. Esses sinais perdem sua utilidade para determinar a discrepância entre alvo e agente.

  3. Como resultado, tornam-se ineficazes para induzir a liberação adequada de dopamina.

  4. A redução da modulação dopaminérgica prejudica a motivação para buscar e manter interações sociais.


Figuras:

  • Figura 1 — Situação normal:
    O agente busca ativamente o alvo, passando pela abertura no topo da barreira.


  • Figura 2 — Situação com previsão atenuada:
    Quando a capacidade de prever a discrepância entre alvo e agente é reduzida, a motivação para alcançar o alvo também diminui fortemente.


Building Proto-Affective Agents with Active Inference

For a list of all posts go here . Building Proto-Affective Agents with Active Inference Most artificial agents today are built around a simp...