01 Overview
Statefulai sits between your AI coding agents and the unstructured chaos of your project — turning sessions, commits, and IDE signals into a queryable long-term memory.
01 · sources
02 · ingest
03 · memory
04 · retrieve
05 · serve
02 Ingestion
Anything that signals intent or change about your codebase can become memory. The ingestion layer accepts streaming and batch signals from agents, version control, and your editor.
Ingestion is intentionally permissive on input, strict on classification. We accept raw streams and assign them types downstream so the same data point can show up as episodic and semantic memory if it carries both kinds of signal.
Agent transcripts
Every Claude Code / Codex / Cursor session is streamed in via MCP. Tool calls, file reads, and accepted diffs all become events.
stream: mcp:transcriptGit history
Commits, branches, PR titles + descriptions, review threads. Webhook-based, replays on connect.
webhook: github · gitlabIDE telemetry
File open/close, jump-to-def, LSP diagnostics, debug sessions. Builds "what the developer was looking at" signal.
extension: vscode · jetbrainsPrompts & outputs
The actual prompts agents send and what they accept back — the strongest "what worked" signal we have.
sdk: ingest.promptDecision docs
ADRs, design docs, and Slack channels you whitelist. Linked back to the code that resulted from them.
connector: slack · notion · mdRuntime signals
CI failures, error traces, deploy outcomes. Closes the loop between "what we wrote" and "what actually worked".
connector: sentry · gha · datadogEvery ingested event lands on Kafka, gets a stable hash, and is replayable. A Temporal workflow handles deduplication, chunking, embedding (Voyage / OpenAI), and routing to the correct memory layer.
# a single PR merge fans out into several memory writes [09:14:01] webhook github · PR #482 merged [09:14:01] route ▸ episodic.event(commit) [09:14:01] route ▸ semantic.update(BillingService) [09:14:02] embed 32 chunks (voyage-code-2) [09:14:02] graph +3 nodes · +7 edges [09:14:02] summary v2 · 412 → 86 tokens [09:14:02] ok commit fanout complete · 1.1s
03 Context engine
The classifier and writer. Ingested events get parsed, embedded, and placed in the right memory layer with the right relationships.
The engine answers three questions for every event:
- Is this an event, a fact, or a habit? — drives memory-type routing.
- What does it relate to? — extracts entities and updates the architectural graph.
- How long is it useful? — assigns a recency profile so the retrieval layer can decay it correctly.
Classification uses a small fine-tuned model per project; entity extraction uses a typed schema you can extend. Both run in the Temporal workflow with retries and idempotency keys, so the same commit re-ingested ten times produces exactly one write.
04 Memory layer
Three layers, three storage shapes, one queryable surface.
The memory layer is split by access pattern rather than by content — episodic data is time-ranged, semantic data is graph-walked, procedural data is rule-matched. You don't write to a "layer" directly; the context engine routes for you.
For a deep dive on each layer, see Memory model.
05 Retrieval
Retrieval is hybrid by default: vector for semantic similarity, BM25 for symbol & identifier match, graph walks for architectural reachability, and a recency prior for "what changed yesterday."
The retriever runs four candidate generators in parallel, unions and dedupes the result set, then reranks with a small cross-encoder tuned on accepted-vs-rejected agent context. Total budget for a default retrieval is 12 ms at p50, 40 ms at p99.
Query
Result · 6 nodes
06 Ranking signals
Retrieval is only as good as its weights. Statefulai combines six signals and tunes them per project from your agents' own accept/reject feedback.
Weights are learned online. When an agent accepts a piece of context (uses it in a tool call or diff), we positive-sample. When it ignores or contradicts it, we negative-sample. The cross-encoder is retrained nightly per workspace.
07 Compression
Raw retrieved memory is too big to fit through a context window. The compression engine turns 50 kB of source into a 1 kB action-ready summary, without losing the parts the agent will need to call.
Compression runs in three modes:
- Structural. Pull function signatures, types, and call graphs; drop bodies.
- Lossy summary. An LLM pass with a strict template — purpose, inputs, side effects, gotchas.
- Reference. Just an ID + a hash; the agent can pull the full source on demand via tool.
Source · 412 lines
async flush(invoiceId: string) {
const lock = await this.locks.acquire(invoiceId);
try {
const usage = await this.stripe.usage(invoiceId);
// ...410 more lines
}
}
Summary · 86 tokens
08 Storage
Boring, durable, replicated. The interesting work happens above.
Source of truth
Events, references, ADRs, audit log. JSONB columns for typed payloads, pgvector for inline embeddings on small tables.
Vector index
Per-workspace collections, HNSW with M=32 / efConstruction=200. Hybrid search via payload filters for branch / actor / repo.
Architecture graph
Typed nodes (Module, Symbol, Decision, Person) and relationships (CALLS, REFERS_TO, DECIDED). Cypher traversals power semantic recall.
Hot cache
Reranked retrieval results per active branch + agent. Sub-millisecond cache hits for the "same query 30 seconds later" pattern.
Raw archive
Immutable event stream, snapshots, and compressed history. Restore-from-source-of-truth in < 10 minutes.
Workflow durability
Every ingest / retrain is a durable workflow with replayable history. No "lost commits" on infra blips.
09 Guarantees
Memory is only useful if you can trust it.
10 Performance
Numbers from production beta workloads across 30+ engineering teams.