How it works — Statefulai

01 What does Statefulai do for a coding agent?

Statefulai sits between your AI coding agents and the unstructured chaos of your project — turning sessions, commits, and IDE signals into a queryable long-term memory.

01 · sources

Agent transcripts

claude · codex · cursor

Git history

commits · branches

IDE telemetry

opens · edits · LSP

02 · ingest

Stream router

Kafka

Workflow engine

Temporal

Embed + chunk

voyage · openai

03 · memory

Context engine

graph + vector + ts

Episodic memory

events · append-only

Semantic memory

graph · embeddings

Procedural memory

rules · workflows

04 · retrieve

Hybrid search

vector + bm25 + graph

Reranker

cross-encoder

Compression

summary-v2

05 · serve

MCP server

stdio · ws

REST / SDK

python · ts · go

Agent context

≤1.2k tok / hit

Design principle. Context is not a prompt. Prompts are ephemeral and per-agent; Statefulai is durable and per-project. Agents query it; they don't own it.

02 What does Statefulai ingest from your repo?

Anything that signals intent or change about your codebase can become memory. The ingestion layer accepts streaming and batch signals from agents, version control, and your editor.

Ingestion is intentionally permissive on input, strict on classification. We accept raw streams and assign them types downstream so the same data point can show up as episodic and semantic memory if it carries both kinds of signal.

Agent transcripts

Every Claude Code / Codex / Cursor session is streamed in via MCP. Tool calls, file reads, and accepted diffs all become events.

stream: mcp:transcript

Git history

Commits, branches, PR titles + descriptions, review threads. Webhook-based, replays on connect.

webhook: github · gitlab

IDE telemetry

File open/close, jump-to-def, LSP diagnostics, debug sessions. Builds "what the developer was looking at" signal.

extension: vscode · jetbrains

Prompts & outputs

The actual prompts agents send and what they accept back — the strongest "what worked" signal we have.

sdk: ingest.prompt

Decision docs

ADRs, design docs, and Slack channels you whitelist. Linked back to the code that resulted from them.

connector: slack · notion · md

Runtime signals

CI failures, error traces, deploy outcomes. Closes the loop between "what we wrote" and "what actually worked".

connector: sentry · gha · datadog

Every ingested event lands on Kafka, gets a stable hash, and is replayable. A Temporal workflow handles deduplication, chunking, embedding (Voyage / OpenAI), and routing to the correct memory layer.

$ ingest pipeline · trace

live

# a single PR merge fans out into several memory writes
[09:14:01] webhook  github · PR #482 merged
[09:14:01] route    ▸ episodic.event(commit)
[09:14:01] route    ▸ semantic.update(BillingService)
[09:14:02] embed    32 chunks (voyage-code-2)
[09:14:02] graph    +3 nodes · +7 edges
[09:14:02] summary  v2 · 412 → 86 tokens
[09:14:02] ok       commit fanout complete · 1.1s

03 Context engine

The classifier and writer. Ingested events get parsed, embedded, and placed in the right memory layer with the right relationships.

The engine answers three questions for every event:

Is this an event, a fact, or a habit? — drives memory-type routing.
What does it relate to? — extracts entities and updates the architectural graph.
How long is it useful? — assigns a recency profile so the retrieval layer can decay it correctly.

Classification uses a small fine-tuned model per project; entity extraction uses a typed schema you can extend. Both run in the Temporal workflow with retries and idempotency keys, so the same commit re-ingested ten times produces exactly one write.

04 Memory layer

Three layers, three storage shapes, one queryable surface.

The memory layer is split by access pattern rather than by content — episodic data is time-ranged, semantic data is graph-walked, procedural data is rule-matched. You don't write to a "layer" directly; the context engine routes for you.

For a deep dive on each layer, see Memory model.

EPEpisodic time-series of events, branch-scopedpostgres

SMSemantic typed graph + vector indexneo4j · qdrant

PRProcedural rules, workflows, preferencesredis · pg

Memory write trace

event: commit#a1b · BillingService.flush()
  → EP log( ts, branch, actor )
  → SM graph( BillingService → flush )
  → SM embed( body, ctx )
  → PR rule( idempotent-writes )

05 How does memory retrieval work?

Retrieval is hybrid by default: vector for semantic similarity, BM25 for symbol & identifier match, graph walks for architectural reachability, and a recency prior for "what changed yesterday."

The retriever runs four candidate generators in parallel, unions and dedupes the result set, then reranks with a small cross-encoder tuned on accepted-vs-rejected agent context. Total budget for a default retrieval is 12 ms at p50, 40 ms at p99.

Retrieval pipeline

Query

refactor billing.ts user

branch: feat/usage-meter

agent: claude-code

budget: 1200 tok

Result · 6 nodes

SM BillingService .84

SM StripeUsageClient .79

EP Stripe usage switch · 04-12 .71

PR idempotent-writes rule .68

SM InvoiceQueue .55

EP billing.ts last edit .48

06 How does Statefulai decide what to recall?

Retrieval is only as good as its weights. Statefulai combines six signals and tunes them per project from your agents' own accept/reject feedback.

01Semantic similarity vector cosine to the active queryw 0.32

02Architectural relevance graph distance to active symbolsw 0.24

03Recency half-life decay tuned per layerw 0.18

04Branch relevance same branch > same repo > orgw 0.12

05Team importance nodes referenced by many teammatesw 0.08

06User context files the active dev opened todayw 0.06

Weights are learned online. When an agent accepts a piece of context (uses it in a tool call or diff), we positive-sample. When it ignores or contradicts it, we negative-sample. The cross-encoder is retrained nightly per workspace.

You can override every weight. Pin a memory to "always retrieve" or set a hard floor for branch-scoped recall.

07 How does Statefulai keep context windows lean?

Raw retrieved memory is too big to fit through a context window. The compression engine turns 50 kB of source into a 1 kB action-ready summary, without losing the parts the agent will need to call.

Compression runs in three modes:

Structural. Pull function signatures, types, and call graphs; drop bodies.
Lossy summary. An LLM pass with a strict template — purpose, inputs, side effects, gotchas.
Reference. Just an ID + a hash; the agent can pull the full source on demand via tool.

Compression · summary-v2

Source · 412 lines

            class BillingService {

              async flush(invoiceId: string) {

                const lock = await this.locks.acquire(invoiceId);

                try {

                  const usage = await this.stripe.usage(invoiceId);

                  // ...410 more lines

              }

            }

Summary · 86 tokens

purpose idempotent invoice flush via Stripe usage API
inputs invoiceId : string
sideFx writes usage_records, emits BillingFlushed
rule must hold per-invoice lock
refs StripeUsageClient · InvoiceQueue

08 Storage

Boring, durable, replicated. The interesting work happens above.

postgres · 16

Source of truth

Events, references, ADRs, audit log. JSONB columns for typed payloads, pgvector for inline embeddings on small tables.

qdrant

Vector index

Per-workspace collections, HNSW with M=32 / efConstruction=200. Hybrid search via payload filters for branch / actor / repo.

neo4j

Architecture graph

Typed nodes (Module, Symbol, Decision, Person) and relationships (CALLS, REFERS_TO, DECIDED). Cypher traversals power semantic recall.

redis

Hot cache

Reranked retrieval results per active branch + agent. Sub-millisecond cache hits for the "same query 30 seconds later" pattern.

Raw archive

Immutable event stream, snapshots, and compressed history. Restore-from-source-of-truth in < 10 minutes.

temporal

Workflow durability

Every ingest / retrain is a durable workflow with replayable history. No "lost commits" on infra blips.

09 What does Statefulai guarantee?

Memory is only useful if you can trust it.

Append-only

Episodic memory is never overwritten. Updates produce new versions; the audit log is the source of truth. Procedural rules can be retired but not erased.

Branch-scoped

Writes carry their git branch. Retrieval prefers same-branch hits, then main, then sibling branches. Switching branches doesn't poison your agent with stale context.

Idempotent

Every ingest carries a content hash. Re-running a webhook 10 times produces exactly one memory write.

Replayable

Every event is replayable from S3. You can rewind a project to any point in time and re-derive the entire memory layer.

Visibility

Every retrieved memory carries its sources, recency, and confidence. Agents can cite their memory; you can audit it.

Right to forget

Per-project tombstones honor deletes within 24h across primary, vector, graph, and cache. Security details →

10 How fast is memory retrieval?

Numbers from production beta workloads across 30+ engineering teams.

p50 retrieve

12ms

hybrid + rerank

p99 retrieve

40ms

cold cache

ingest fanout

1.1s

commit → indexed

tok per hit

≤1.2k

post-compression

Next →Memory model: three kinds of memory, in depth. Or →Integrations: drop in over MCP or SDK.

An always-on memory for every coding agent you'll ever use.

01 What does Statefulai do for a coding agent?

01 · sources

02 · ingest

03 · memory

04 · retrieve

05 · serve

02 What does Statefulai ingest from your repo?

Agent transcripts

Git history

IDE telemetry

Prompts & outputs

Decision docs

Runtime signals

03 Context engine

04 Memory layer

05 How does memory retrieval work?

Query

Result · 6 nodes

06 How does Statefulai decide what to recall?

07 How does Statefulai keep context windows lean?

Source · 412 lines

Summary · 86 tokens

08 Storage

Source of truth

Vector index

Architecture graph

Hot cache

Raw archive

Workflow durability

09 What does Statefulai guarantee?

10 How fast is memory retrieval?

Bring memory online in 60 seconds.