Production AI agents need two memory layers, not one. A context graph for structured decision intelligence, answering “why was this decided, under what policy, with what evidence.” And an agent memory system for self-improving conversational recall, answering “what patterns have we learned over time.”
At Lifesight, we ran into this problem while building our marketing intelligence agents (MIA). The agents needed to reason about budget decisions, measurement evidence, and business policies, not just recall past conversations. That experience led us to our three-layer architecture described here, which applies well beyond marketing to any domain where agents need governed, explainable decision-making.
This article breaks down the architecture, integration patterns, and production cost trade-offs across competing technology stacks.
The Problem: AI Agents Have Two Memory Gaps
Every production AI agent faces the same bottleneck: statelessness. LLMs process each request independently, with no durable memory of prior interactions or accumulated institutional knowledge. This manifests as two distinct failures.
The Conversational Memory Gap
When a user asks an AI assistant to “use the same format as last time,” the agent has no idea what “last time” means. Preferences, interaction history, and session context vanish between conversations. This is what agent memory systems like Mem0, Hindsight, and Vertex AI Memory Bank solve — persistent storage and semantic retrieval of facts learned from conversations.
The Decision Intelligence Gap
The deeper problem is structural. When a CFO asks “Why did we increase the Paid Search budget by 28% in Q4 2025?”, the system needs to traverse a chain of structured relationships:
-
- The BudgetDecision node linked to the Policy that governed it (BudgetPolicy v2.1, requiring VP approval for changes above 30%)
- The Evidence that supported it (an MMM model showing 4.2x ROAS)
- The Person who approved it (VP Marketing)
- The Precedent from Q4 2024 that informed the decision
No existing agent memory system can answer this. Conventional approaches store flat facts; they form opinions from experience. Neither can traverse typed entity-relationship chains with temporal filtering..
This requires a context graph – a knowledge graph augmented with decision traces, provenance, bitemporal modeling, and causal chain links.
The Thesis
Production AI agent systems need two complementary layers: a context graph as the brain (structured decision intelligence) and an agent memory system as the personality (self-improving conversational recall). Conflating these produces architectures that solve neither well.
This isn’t limited to marketing. Supply chain planning, clinical decision support, financial compliance, security operations — any domain where agents must make auditable decisions under governance constraints faces the same architectural split.
The Agent Memory Landscape in 2026
The space has matured quickly. Here’s how the major systems compare:
| System | Architecture | Graph Support | Temporal | Differentiator |
|---|---|---|---|---|
| Mem0 | Vector + optional graph | Pro only ($249/mo) | No | Largest ecosystem (50K+ stars) |
| Hindsight | 4-strategy parallel retrieval | All tiers (free) | Yes | Self-improving: retain/recall/reflect |
| Zep / Graphiti | Temporal knowledge graph | Yes (Neo4j) | Core strength | Best-in-class temporal reasoning |
| Letta | OS-inspired paging | No | No | LLM manages its own memory |
| Vertex AI Memory Bank | Gemini-powered extraction | No | TTL only | GCP-native, fully managed |
| Cognee | Knowledge-graph-first RAG | Yes | Partial | Graph + multimodal ingestion |
The critical insight: all of these store flat facts or loosely-connected entity graphs. None supports custom typed schemas, multi-hop GQL pattern matching, bitemporal filtering, or structured explanation packets. Those require a purpose-built graph database.
Context Graphs: The Missing Layer
A context graph, as documented by Adnan Masood (January 2026), is a knowledge graph augmented with contextual metadata — decision records, policies, temporal validity, and provenance — to capture not just what facts are true, but why, how, when, and under what conditions they became true.
Context Graph vs. Knowledge Graph vs. Agent Memory
| Dimension | Knowledge Graph | Agent Memory | Context Graph |
|---|---|---|---|
| Core question | What things are? | What does the agent remember? | Why/how/when things happened? |
| Data model | Entities + static triples | Flat facts + embeddings | Entities + decisions + policies + evidence |
| Temporal | Usually static | Timestamps on facts | Bitemporal (valid time + transaction time) |
| Provenance | Often absent | Source attribution | First-class (W3C PROV, confidence scores) |
| Query model | SPARQL / Cypher | Semantic similarity | GQL pattern matching + vector hybrid |
A Concrete Schema Example
To make this tangible, here’s how a context graph captures a decision lifecycle. While this example uses marketing measurement, the same pattern applies to any governed domain — swap “Campaign” for “Clinical Trial” or “Trade Order” and the structure holds:
- Entity nodes: Campaign, Channel, Creative, Audience, Product, Market
- Measurement nodes: MetricSnapshot (with ROAS, CAC, confidence intervals), Model (MMM, MTA, with version and methodology)
- Decision nodes: BudgetDecision (action, rationale, confidence, approver)
- Governance nodes: Policy (valid_from/valid_to, escalation thresholds)
- Evidence nodes: Document, Alert, Report
Connected through typed relationships: RESOLVED_B, APPLIED_POLICY, SUPPORTED_BY, APPROVED_BY, PRECEDENT_FOR<, GENERATED_BY.
The Three-Layer Architecture
| Layer |
|---|
| Layer 1: Context Graph (Spanner Graph) — The Brain
Structured decision intelligence“Why was this decided? Under what policy?” |
| Layer 2: Agent Memory (Hindsight) — The Personality
Self-improving recall“What patterns drive success? What have we learned?” |
| Layer 3: Sessions (Vertex AI Agent Engine) — STM
Ephemeral conversation state“What did we discuss? What format does the user want?” |
Layer 1: Context Graph (Spanner Graph) – The Brain
Structured decision intelligence. Custom property graph schema with GQL traversal, bitemporal filtering, GraphRAG, RBAC. Accessed via custom ADK function tools.
Layer 2: Agent Memory (Hindsight) – The Personality
Self-improving recall. 4-strategy retrieval (semantic, BM25, entity graph, temporal), entity resolution, confidence-scored opinions. Wired as ADK’s MemoryService via custom implementation.
Layer 3: Sessions (Vertex AI Agent Engine) – Short-Term Memory
Ephemeral conversation state. Managed by Google. Wired as ADK’s VertexAiSessionService.
Architecture Decision: Spanner Graph vs. Neo4j
Why Spanner Graph ( Wins for GCP-Native Systems )
Unified relational + graph model. Spanner Graph maps existing relational tables to a property graph declaratively — no ETL, no data duplication. Your existing SQL tables are queryable as graph nodes simultaneously.
SQL + GQL in one query. Traverse the Decision → Policy → Evidence chain via GQL, then join with relational tables in SQL — in a single statement. Neo4j can’t do this.
Vertex AI integration. Call ML.PREDICT for embeddings inside graph queries. Run vector similarity + graph traversal in one operation. No external embedding pipeline needed.
LangChain GraphRAG support. Google provides SpannerGraphStore and SpannerGraphVectorContextRetriever for LangChain, plus a reference architecture for GraphRAG with Vertex AI.
What You Lose vs. Neo4j
-
-
- Graph Data Science algorithms (PageRank, community detection) — mitigate with BigQuery ML
- APOC procedures — rewrite as bounded GQL path patterns
- Bloom visualization — use Cytoscape.js or Looker
- Cypher ecosystem — translate to GQL (mechanical, not architectural)
-
Cost Comparison
These figures reflect production-grade configurations, not bare minimums. Actual costs depend on data volume, query patterns, and region.
| Configuration | Spanner Graph | Neo4j AuraDB | Notes |
|---|---|---|---|
| Dev/Prototype | ~$65/mo | ~$65/mo | Spanner 100 PU; Neo4j 1GB Professional |
| Production | ~$1,060/mo | ~$1,340/mo | Spanner 1-node regional; Neo4j 8GB Professional |
| Scale | ~$2,960/mo | ~$4,930/mo | Spanner 3-node regional; Neo4j 16GB+ Professional |
At the development stage, costs are comparable. Spanner’s economics improve at production and scale configurations due to its per-processing-unit pricing model.
Hindsight vs. Mem0: The Agent Memory Decision
Learning vs. Remembering
Hindsight is built to make agents that learn, not just remember. The reflect() operation synthesizes observations and confidence-scored opinions from accumulated facts. After 50 optimization cycles, Hindsight can reflect on “What patterns drive successful outcomes?” and produce grounded insights — not summaries, but beliefs with confidence scores.
Mem0 excels at efficient memory compression (up to 80% token reduction) and has the largest community (50K+ stars). But its architecture is optimized for recall, not compounding learning.
Feature Parity Costs Are Comparable
At comparable graph + temporal + semantic retrieval capabilities, both systems cost ~$345–465/mo. The key difference: Mem0 Starter ($19/mo) provides vector-only search. To match Hindsight’s included 4-strategy retrieval, you need Mem0 Pro at $249/mo.
Neither is wrong. If your agents primarily need to remember user preferences and retrieve past context, Mem0’s ecosystem is mature and well-integrated. If your agents need to form opinions and compound learning across hundreds of interactions, Hindsight’s architecture is purpose-built for it.
Hindsight vs. Mem0: The Agent Memory Decision
Learning vs. Remembering
Hindsight is built to make agents that learn, not just remember. The reflect() operation synthesizes observations and confidence-scored opinions from accumulated facts. After 50 optimization cycles, Hindsight can reflect on “What patterns drive successful outcomes?” and produce grounded insights — not summaries, but beliefs with confidence scores.
Mem0 excels at efficient memory compression (up to 80% token reduction) and has the largest community (50K+ stars). But its architecture is optimized for recall, not compounding learning.
Feature Parity Costs Are Comparable
At comparable graph + temporal + semantic retrieval capabilities, both systems cost ~$345–465/mo. The key difference: Mem0 Starter ($19/mo) provides vector-only search. To match Hindsight’s included 4-strategy retrieval, you need Mem0 Pro at $249/mo.
Neither is wrong. If your agents primarily need to remember user preferences and retrieve past context, Mem0’s ecosystem is mature and well-integrated. If your agents need to form opinions and compound learning across hundreds of interactions, Hindsight’s architecture is purpose-built for it.
ADK Integration Pattern
The Runner Wires Everything
from google.adk.runners import Runner
from google.adk.sessions import VertexAiSessionService
# Layer 3: Sessions (short-term)
session_service = VertexAiSessionService(
project=“my-project”, location=“us-central1”
)
# Layer 2: Agent memory (Hindsight)
memory_service = HindsightMemoryService(
base_url=“http://hindsight-api.default.svc:8888”
)
# Layer 1: Context graph accessed via tools (not MemoryService)
# query_decision_trace, search_precedents, check_policy_compliance,
# write_decision_node, hybrid_graphrag_search
runner = Runner(
agent=orchestrator,
app_name=“decision-intelligence”,
session_service=session_service, # Layer 3
memory_service=memory_service # Layer 2
# Layer 1: Spanner Graph via custom tools on each agent
)
Key Design Decision
Spanner Graph is accessed through custom function tools, not through MemoryService. This is intentional.
MemoryService is designed for flat semantic search — it takes a query string and returns relevant memories. Context graph queries are fundamentally different: typed GQL patterns with multi-hop traversal, temporal filtering, and policy-aware access control. Tools give full control over query shape.
Example context graph tool:
@tool
def query_decision_trace(decision_id: str) -> dict:
“””Retrieve the full decision trace: decision → policy → evidence → precedent.”””
query = “””
GRAPH MarketingGraph
MATCH (d:Decision {id: @id})-[:APPLIED_POLICY]->(p:Policy),
(d)-[:SUPPORTED_BY]->(e:Evidence),
(d)-[:PRECEDENT_FOR*0..2]->(prev:Decision)
WHERE d.valid_from <= CURRENT_TIMESTAMP()
AND (d.valid_to IS NULL OR d.valid_to > CURRENT_TIMESTAMP())
RETURN d, p, e, prev
“””
return spanner_client.execute(query, params={“id”: decision_id})
This is the query shape that MemoryService can’t express. The tool wraps it cleanly for ADK agents.
Limitations and Trade-Offs
No architecture post is complete without honest trade-offs. Here are the ones we’ve encountered:
Spanner Graph is GCP-only. If you’re multi-cloud or AWS-native, Neo4j or Amazon Neptune may be more practical, even at the cost of the unified SQL+GQL model. The architecture pattern (context graph + agent memory + sessions) is portable; the specific implementation is not.
Hindsight is young. The project launched in late 2025. Mem0’s ecosystem is more mature, with broader framework integrations and a larger community. If you need production stability today with minimal integration work, Mem0 is the safer bet. If you’re willing to invest in a newer system for the learning capabilities, Hindsight rewards that.
Context graphs require schema design upfront. Unlike agent memory systems that ingest unstructured facts, context graphs demand you model your decision domain explicitly. You need to define your node types, relationship types, temporal properties, and governance rules before your first query. This is an investment — it pays off in query precision and auditability, but it’s not zero-effort.Operational complexity. Three layers means three systems to monitor, version, and maintain. For smaller teams or simpler use cases, a two-layer approach (agent memory + sessions) may be sufficient. Add the context graph when your agents start making decisions that need to be auditable and explainable — not before.
Where We’re Applying This
At Lifesight, we use a variant of this architecture to power our marketing intelligence agents. The context graph stores the relationships between channels, campaigns, measurement models, business rules, and budget decisions — the full decision lifecycle.
When one of our agents recommends reallocating the budget from one channel to another, it doesn’t just surface a number. It traces the reasoning back through the causal measurement model, the business policy that governs spending changes, and the historical precedent that informed the recommendation. A marketing team can ask “why?” and get a structured, auditable answer — not a hallucinated explanation.
This is what we think separates agents that suggest from agents that are trusted. The difference isn’t the LLM. It’s the memory architecture behind it.
We’re sharing this architecture because we believe the pattern — context graphs for governed reasoning, agent memory for learning — is where the industry is heading. Not just for marketing, but for any domain where autonomous agents need to earn trust through explainability.
Conclusion
The AI agent memory problem is not one problem — it is three. Short-term conversational state, self-improving experiential learning, and governed structural decision intelligence each require purpose-built solutions.
The architecture presented here pairs Spanner Graph as the context graph brain with Hindsight as the self-improving memory personality, unified through Google’s ADK on Cloud Run. The entire system runs GCP-native.
If your enterprise AI roadmap includes autonomous or semi-autonomous agents, a context graph is the most practical path to scaling trust, auditability, and control — without crippling velocity. The goal is not to slow down AI. It’s to make AI fast and trustworthy, by giving it governed memory.
Resources
- Hindsight: arXiv 2512.12818 — Retain/Recall/Reflect architecture
- Zep Temporal KG: arXiv 2501.13956 — Temporal knowledge graph for agent memory
- Spanner Graph + LangChain — GraphRAG integration
- Google ADK documentation — Agent Development Kit
- W3C PROV-DM — Provenance data model standard
- Adnan Masood: Context Graphs — Practical guide to governed context
You may also like
Essential resources for your success




