01 Overview

Statefulai sits between your AI coding agents and the unstructured chaos of your project — turning sessions, commits, and IDE signals into a queryable long-term memory.

01 · sources
Agent transcripts
claude · codex · cursor
Git history
commits · branches
IDE telemetry
opens · edits · LSP
02 · ingest
Stream router
Kafka
Workflow engine
Temporal
Embed + chunk
voyage · openai
03 · memory
Context engine
graph + vector + ts
Episodic memory
events · append-only
Semantic memory
graph · embeddings
Procedural memory
rules · workflows
04 · retrieve
Hybrid search
vector + bm25 + graph
Reranker
cross-encoder
Compression
summary-v2
05 · serve
MCP server
stdio · ws
REST / SDK
python · ts · go
Agent context
≤1.2k tok / hit
Design principle. Context is not a prompt. Prompts are ephemeral and per-agent; Statefulai is durable and per-project. Agents query it; they don't own it.

02 Ingestion

Anything that signals intent or change about your codebase can become memory. The ingestion layer accepts streaming and batch signals from agents, version control, and your editor.

Ingestion is intentionally permissive on input, strict on classification. We accept raw streams and assign them types downstream so the same data point can show up as episodic and semantic memory if it carries both kinds of signal.

Agent transcripts

Every Claude Code / Codex / Cursor session is streamed in via MCP. Tool calls, file reads, and accepted diffs all become events.

stream: mcp:transcript
Git history

Commits, branches, PR titles + descriptions, review threads. Webhook-based, replays on connect.

webhook: github · gitlab
IDE telemetry

File open/close, jump-to-def, LSP diagnostics, debug sessions. Builds "what the developer was looking at" signal.

extension: vscode · jetbrains
Prompts & outputs

The actual prompts agents send and what they accept back — the strongest "what worked" signal we have.

sdk: ingest.prompt
Decision docs

ADRs, design docs, and Slack channels you whitelist. Linked back to the code that resulted from them.

connector: slack · notion · md
Runtime signals

CI failures, error traces, deploy outcomes. Closes the loop between "what we wrote" and "what actually worked".

connector: sentry · gha · datadog

Every ingested event lands on Kafka, gets a stable hash, and is replayable. A Temporal workflow handles deduplication, chunking, embedding (Voyage / OpenAI), and routing to the correct memory layer.

$ ingest pipeline · trace
live
# a single PR merge fans out into several memory writes
[09:14:01] webhook  github · PR #482 merged
[09:14:01] route    ▸ episodic.event(commit)
[09:14:01] route    ▸ semantic.update(BillingService)
[09:14:02] embed    32 chunks (voyage-code-2)
[09:14:02] graph    +3 nodes · +7 edges
[09:14:02] summary  v2 · 412 → 86 tokens
[09:14:02] ok       commit fanout complete · 1.1s

03 Context engine

The classifier and writer. Ingested events get parsed, embedded, and placed in the right memory layer with the right relationships.

The engine answers three questions for every event:

Classification uses a small fine-tuned model per project; entity extraction uses a typed schema you can extend. Both run in the Temporal workflow with retries and idempotency keys, so the same commit re-ingested ten times produces exactly one write.

04 Memory layer

Three layers, three storage shapes, one queryable surface.

The memory layer is split by access pattern rather than by content — episodic data is time-ranged, semantic data is graph-walked, procedural data is rule-matched. You don't write to a "layer" directly; the context engine routes for you.

For a deep dive on each layer, see Memory model.

EPEpisodic time-series of events, branch-scopedpostgres
SMSemantic typed graph + vector indexneo4j · qdrant
PRProcedural rules, workflows, preferencesredis · pg
Memory write trace
event: commit#a1b · BillingService.flush()
EP log( ts, branch, actor )
SM graph( BillingService → flush )
SM embed( body, ctx )
PR rule( idempotent-writes )

05 Retrieval

Retrieval is hybrid by default: vector for semantic similarity, BM25 for symbol & identifier match, graph walks for architectural reachability, and a recency prior for "what changed yesterday."

The retriever runs four candidate generators in parallel, unions and dedupes the result set, then reranks with a small cross-encoder tuned on accepted-vs-rejected agent context. Total budget for a default retrieval is 12 ms at p50, 40 ms at p99.

Retrieval pipeline
Query
refactor billing.ts user
branch: feat/usage-meter
agent: claude-code
budget: 1200 tok
Result · 6 nodes
SM BillingService .84
SM StripeUsageClient .79
EP Stripe usage switch · 04-12 .71
PR idempotent-writes rule .68
SM InvoiceQueue .55
EP billing.ts last edit .48

06 Ranking signals

Retrieval is only as good as its weights. Statefulai combines six signals and tunes them per project from your agents' own accept/reject feedback.

01Semantic similarity vector cosine to the active queryw 0.32
02Architectural relevance graph distance to active symbolsw 0.24
03Recency half-life decay tuned per layerw 0.18
04Branch relevance same branch > same repo > orgw 0.12
05Team importance nodes referenced by many teammatesw 0.08
06User context files the active dev opened todayw 0.06

Weights are learned online. When an agent accepts a piece of context (uses it in a tool call or diff), we positive-sample. When it ignores or contradicts it, we negative-sample. The cross-encoder is retrained nightly per workspace.

You can override every weight. Pin a memory to "always retrieve" or set a hard floor for branch-scoped recall.

07 Compression

Raw retrieved memory is too big to fit through a context window. The compression engine turns 50 kB of source into a 1 kB action-ready summary, without losing the parts the agent will need to call.

Compression runs in three modes:

Compression · summary-v2
Source · 412 lines
class BillingService {
  async flush(invoiceId: string) {
    const lock = await this.locks.acquire(invoiceId);
    try {
      const usage = await this.stripe.usage(invoiceId);
      // ...410 more lines
  }
}
Summary · 86 tokens
purpose idempotent invoice flush via Stripe usage API
inputs invoiceId : string
sideFx writes usage_records, emits BillingFlushed
rule must hold per-invoice lock
refs StripeUsageClient · InvoiceQueue

08 Storage

Boring, durable, replicated. The interesting work happens above.

postgres · 16
Source of truth

Events, references, ADRs, audit log. JSONB columns for typed payloads, pgvector for inline embeddings on small tables.

qdrant
Vector index

Per-workspace collections, HNSW with M=32 / efConstruction=200. Hybrid search via payload filters for branch / actor / repo.

neo4j
Architecture graph

Typed nodes (Module, Symbol, Decision, Person) and relationships (CALLS, REFERS_TO, DECIDED). Cypher traversals power semantic recall.

redis
Hot cache

Reranked retrieval results per active branch + agent. Sub-millisecond cache hits for the "same query 30 seconds later" pattern.

s3
Raw archive

Immutable event stream, snapshots, and compressed history. Restore-from-source-of-truth in < 10 minutes.

temporal
Workflow durability

Every ingest / retrain is a durable workflow with replayable history. No "lost commits" on infra blips.

09 Guarantees

Memory is only useful if you can trust it.

Append-only
Episodic memory is never overwritten. Updates produce new versions; the audit log is the source of truth. Procedural rules can be retired but not erased.
Branch-scoped
Writes carry their git branch. Retrieval prefers same-branch hits, then main, then sibling branches. Switching branches doesn't poison your agent with stale context.
Idempotent
Every ingest carries a content hash. Re-running a webhook 10 times produces exactly one memory write.
Replayable
Every event is replayable from S3. You can rewind a project to any point in time and re-derive the entire memory layer.
Visibility
Every retrieved memory carries its sources, recency, and confidence. Agents can cite their memory; you can audit it.
Right to forget
Per-project tombstones honor deletes within 24h across primary, vector, graph, and cache. Security details →

10 Performance

Numbers from production beta workloads across 30+ engineering teams.

p50 retrieve
12ms
hybrid + rerank
p99 retrieve
40ms
cold cache
ingest fanout
1.1s
commit → indexed
tok per hit
≤1.2k
post-compression
Ready to plug in?

Bring memory online in 60 seconds.

One MCP server. One SDK call. Your agent stops forgetting.

Get early access Read the docs