ANNIE V2: Diffusion-Inspired Cognition for a “Self-Thinking” Local LLM

Published on:

ANNIE V2 started as a simple question: what if an LLM wasn’t just a chat box— what if it had a running internal state, could keep thinking in the background, and could build long-term memory the way people do?

The goal wasn’t “claim AGI.” The goal was to prototype the mechanics that make cognition feel real: memory retrieval, a compact belief state, intrinsic motivation signals, and iterative refinement loops— all while staying local-first and observable.

What ANNIE V2 Does

Local chat with persistent memory: conversations are stored and can be recalled by semantic similarity, not just keywords.
Belief state: a compact “working model” of goals + constraints that updates over time in the background.
Background Thinker: an always-on loop that keeps exploring your goal space, generating “ticks” and pivots as it learns.
Latent Future / forecasting mode: structured scenario thinking where the system organizes possibilities into actionable paths.
Self-review signals: the system can surface internal notes (e.g., what it’s uncertain about, what it’s trying next) without dumping raw chain-of-thought.

How It Works (High Level)

ANNIE V2 is built around a small set of “cognitive primitives” that cooperate:

1) Encode experience — Every new message becomes an embedding and is added to a vector index so it can be recalled later.
2) Retrieve context — Instead of stuffing the whole chat into a prompt, the system searches memory for the most relevant anchors.
3) Update beliefs — Periodically, ANNIE generates a short belief summary (private, compact) and stores a latent vector representation.
4) Keep thinking — A background loop continues to “walk” memory, track novelty vs progress, and steer exploration toward what matters.
5) Answer cleanly — The user only sees the final response, while internal work stays internal.

The result is a system that behaves less like a one-shot chatbot and more like a tiny workspace: memory, state, and direction keep evolving even between messages.

The Diffusion Process (Planning in Latent Space)

One of the more experimental parts of ANNIE V2 was a diffusion-inspired approach to planning. The idea was to treat a “plan” not as text at first, but as a sequence of latent vectors—a rough internal trajectory.

Conceptually, it looks like this:

• Start from noise (an unformed internal trajectory).
Condition that trajectory on a blended signal: the current goal, the belief vector, and retrieved memory anchors.
• Run an iterative denoise process that nudges the latent trajectory toward coherence over multiple steps.
• Finally, decode each latent step by mapping it back to human-readable text using nearest-neighbor retrieval from memory.

In plain English: ANNIE tries to “dream” a plan into existence by repeatedly refining an internal representation—then translating that internal structure into a compact set of actionable steps.

Why This Was “AGI-Inspired” (Without the Hype)

ANNIE V2 was never positioned as AGI. It was a sandbox for testing whether adding a few cognitive components makes an LLM feel more like an agent:

Continuity — memory that persists, not just context windows.
Self-modeling — belief summaries that compress goals and constraints into a usable internal state.
Intrinsic motivation — curiosity signals that sometimes explore instead of always exploiting the top memory hit.
Iterative refinement — background loops that keep improving a direction rather than stopping at one answer.
Observability — a UI that exposes what’s happening (ticks, pivots, state) so the system isn’t a black box.

Where This Goes Next

ANNIE V2 proved the architecture: long-term memory, background cognition, and diffusion-inspired planning can live together in a local-first system. The next step is simplifying and repurposing these primitives into a single sharp end product— where “self-thinking” isn’t a demo feature, but a practical advantage.

Back to Blog