Production AI agents need two memory layers, not one. A context graph for structured decision intelligence, answering “why was this decided, under what policy, with what evidence.” And an agent memory system for self-improving conversational recall, answering “what patterns have we learned over time.”

At Lifesight, we ran into this problem while building our marketing intelligence agents (MIA). The agents needed to reason about budget decisions, measurement evidence, and business policies, not just recall past conversations. That experience led us to our three-layer architecture described here, which applies well beyond marketing to any domain where agents need governed, explainable decision-making.

This article breaks down the architecture, integration patterns, and production cost trade-offs across competing technology stacks.

The Problem: AI Agents Have Two Memory Gaps

Every production AI agent faces the same bottleneck: statelessness. LLMs process each request independently, with no durable memory of prior interactions or accumulated institutional knowledge. This manifests as two distinct failures.

The Conversational Memory Gap

When a user asks an AI assistant to “use the same format as last time,” the agent has no idea what “last time” means. Preferences, interaction history, and session context vanish between conversations. This is what agent memory systems like Mem0, Hindsight, and Vertex AI Memory Bank solve — persistent storage and semantic retrieval of facts learned from conversations.

The Decision Intelligence Gap

The deeper problem is structural. When a CFO asks “Why did we increase the Paid Search budget by 28% in Q4 2025?”, the system needs to traverse a chain of structured relationships:

    • The BudgetDecision node linked to the Policy that governed it (BudgetPolicy v2.1, requiring VP approval for changes above 30%)
    • The Evidence that supported it (an MMM model showing 4.2x ROAS)
    • The Person who approved it (VP Marketing)
    • The Precedent from Q4 2024 that informed the decision

No existing agent memory system can answer this. Conventional approaches store flat facts; they form opinions from experience. Neither can traverse typed entity-relationship chains with temporal filtering..

This requires a context graph – a knowledge graph augmented with decision traces, provenance, bitemporal modeling, and causal chain links.

The Thesis

Production AI agent systems need two complementary layers: a context graph as the brain (structured decision intelligence) and an agent memory system as the personality (self-improving conversational recall). Conflating these produces architectures that solve neither well.

This isn’t limited to marketing. Supply chain planning, clinical decision support, financial compliance, security operations — any domain where agents must make auditable decisions under governance constraints faces the same architectural split.

The Agent Memory Landscape in 2026

The space has matured quickly. Here’s how the major systems compare:

System Architecture Graph Support Temporal Differentiator
Mem0 Vector + optional graph Pro only ($249/mo) No Largest ecosystem (50K+ stars)
Hindsight 4-strategy parallel retrieval All tiers (free) Yes Self-improving: retain/recall/reflect
Zep / Graphiti Temporal knowledge graph Yes (Neo4j) Core strength Best-in-class temporal reasoning
Letta OS-inspired paging No No LLM manages its own memory
Vertex AI Memory Bank Gemini-powered extraction No TTL only GCP-native, fully managed
Cognee Knowledge-graph-first RAG Yes Partial Graph + multimodal ingestion

The critical insight: all of these store flat facts or loosely-connected entity graphs. None supports custom typed schemas, multi-hop GQL pattern matching, bitemporal filtering, or structured explanation packets. Those require a purpose-built graph database.

Context Graphs: The Missing Layer

A context graph, as documented by Adnan Masood (January 2026), is a knowledge graph augmented with contextual metadata — decision records, policies, temporal validity, and provenance — to capture not just what facts are true, but why, how, when, and under what conditions they became true.

Context Graph vs. Knowledge Graph vs. Agent Memory

Dimension Knowledge Graph Agent Memory Context Graph
Core question What things are? What does the agent remember? Why/how/when things happened?
Data model Entities + static triples Flat facts + embeddings Entities + decisions + policies + evidence
Temporal Usually static Timestamps on facts Bitemporal (valid time + transaction time)
Provenance Often absent Source attribution First-class (W3C PROV, confidence scores)
Query model SPARQL / Cypher Semantic similarity GQL pattern matching + vector hybrid

A Concrete Schema Example

To make this tangible, here’s how a context graph captures a decision lifecycle. While this example uses marketing measurement, the same pattern applies to any governed domain — swap “Campaign” for “Clinical Trial” or “Trade Order” and the structure holds:

  • Entity nodes: Campaign, Channel, Creative, Audience, Product, Market
  • Measurement nodes: MetricSnapshot (with ROAS, CAC, confidence intervals), Model (MMM, MTA, with version and methodology)
  • Decision nodes: BudgetDecision (action, rationale, confidence, approver)
  • Governance nodes: Policy (valid_from/valid_to, escalation thresholds)
  • Evidence nodes: Document, Alert, Report

Connected through typed relationships: RESOLVED_B, APPLIED_POLICY, SUPPORTED_BY, APPROVED_BY, PRECEDENT_FOR<, GENERATED_BY.

The Three-Layer Architecture

Layer
Layer 1: Context Graph (Spanner Graph) — The Brain

Structured decision intelligence“Why was this decided? Under what policy?”

Layer 2: Agent Memory (Hindsight) — The Personality

Self-improving recall“What patterns drive success? What have we learned?”

Layer 3: Sessions (Vertex AI Agent Engine) — STM

Ephemeral conversation state“What did we discuss? What format does the user want?”

Layer 1: Context Graph (Spanner Graph) – The Brain

Structured decision intelligence. Custom property graph schema with GQL traversal, bitemporal filtering, GraphRAG, RBAC. Accessed via custom ADK function tools.

Layer 2: Agent Memory (Hindsight) – The Personality

Self-improving recall. 4-strategy retrieval (semantic, BM25, entity graph, temporal), entity resolution, confidence-scored opinions. Wired as ADK’s MemoryService via custom implementation.

Layer 3: Sessions (Vertex AI Agent Engine) – Short-Term Memory

Ephemeral conversation state. Managed by Google. Wired as ADK’s VertexAiSessionService.

Architecture Decision: Spanner Graph vs. Neo4j

Why Spanner Graph ( Wins for GCP-Native Systems )

Unified relational + graph model. Spanner Graph maps existing relational tables to a property graph declaratively — no ETL, no data duplication. Your existing SQL tables are queryable as graph nodes simultaneously.

SQL + GQL in one query. Traverse the Decision → Policy → Evidence chain via GQL, then join with relational tables in SQL — in a single statement. Neo4j can’t do this.

Vertex AI integration. Call ML.PREDICT for embeddings inside graph queries. Run vector similarity + graph traversal in one operation. No external embedding pipeline needed.

LangChain GraphRAG support. Google provides SpannerGraphStore and SpannerGraphVectorContextRetriever for LangChain, plus a reference architecture for GraphRAG with Vertex AI.

What You Lose vs. Neo4j

      • Graph Data Science algorithms (PageRank, community detection) — mitigate with BigQuery ML
      • APOC procedures — rewrite as bounded GQL path patterns
      • Bloom visualization — use Cytoscape.js or Looker
      • Cypher ecosystem — translate to GQL (mechanical, not architectural)

Cost Comparison

These figures reflect production-grade configurations, not bare minimums. Actual costs depend on data volume, query patterns, and region.

Configuration Spanner Graph Neo4j AuraDB Notes
Dev/Prototype ~$65/mo ~$65/mo Spanner 100 PU; Neo4j 1GB Professional
Production ~$1,060/mo ~$1,340/mo Spanner 1-node regional; Neo4j 8GB Professional
Scale ~$2,960/mo ~$4,930/mo Spanner 3-node regional; Neo4j 16GB+ Professional

At the development stage, costs are comparable. Spanner’s economics improve at production and scale configurations due to its per-processing-unit pricing model.

Hindsight vs. Mem0: The Agent Memory Decision

Learning vs. Remembering

Hindsight is built to make agents that learn, not just remember. The reflect() operation synthesizes observations and confidence-scored opinions from accumulated facts. After 50 optimization cycles, Hindsight can reflect on “What patterns drive successful outcomes?” and produce grounded insights — not summaries, but beliefs with confidence scores.

Mem0 excels at efficient memory compression (up to 80% token reduction) and has the largest community (50K+ stars). But its architecture is optimized for recall, not compounding learning.

Feature Parity Costs Are Comparable

At comparable graph + temporal + semantic retrieval capabilities, both systems cost ~$345–465/mo. The key difference: Mem0 Starter ($19/mo) provides vector-only search. To match Hindsight’s included 4-strategy retrieval, you need Mem0 Pro at $249/mo.

Neither is wrong. If your agents primarily need to remember user preferences and retrieve past context, Mem0’s ecosystem is mature and well-integrated. If your agents need to form opinions and compound learning across hundreds of interactions, Hindsight’s architecture is purpose-built for it.

Hindsight vs. Mem0: The Agent Memory Decision

Learning vs. Remembering

Hindsight is built to make agents that learn, not just remember. The reflect() operation synthesizes observations and confidence-scored opinions from accumulated facts. After 50 optimization cycles, Hindsight can reflect on “What patterns drive successful outcomes?” and produce grounded insights — not summaries, but beliefs with confidence scores.

Mem0 excels at efficient memory compression (up to 80% token reduction) and has the largest community (50K+ stars). But its architecture is optimized for recall, not compounding learning.

Feature Parity Costs Are Comparable

At comparable graph + temporal + semantic retrieval capabilities, both systems cost ~$345–465/mo. The key difference: Mem0 Starter ($19/mo) provides vector-only search. To match Hindsight’s included 4-strategy retrieval, you need Mem0 Pro at $249/mo.

Neither is wrong. If your agents primarily need to remember user preferences and retrieve past context, Mem0’s ecosystem is mature and well-integrated. If your agents need to form opinions and compound learning across hundreds of interactions, Hindsight’s architecture is purpose-built for it.

ADK Integration Pattern

The Runner Wires Everything

from google.adk.runners import Runner

from google.adk.sessions import VertexAiSessionService

# Layer 3: Sessions (short-term)

session_service = VertexAiSessionService(

project=“my-project”, location=“us-central1”

)

# Layer 2: Agent memory (Hindsight)

memory_service = HindsightMemoryService(

base_url=“http://hindsight-api.default.svc:8888”

)

# Layer 1: Context graph accessed via tools (not MemoryService)

# query_decision_trace, search_precedents, check_policy_compliance,

# write_decision_node, hybrid_graphrag_search

runner = Runner(

agent=orchestrator,

app_name=“decision-intelligence”,

session_service=session_service, # Layer 3

memory_service=memory_service # Layer 2

# Layer 1: Spanner Graph via custom tools on each agent

)

Key Design Decision

Spanner Graph is accessed through custom function tools, not through MemoryService. This is intentional.

MemoryService is designed for flat semantic search — it takes a query string and returns relevant memories. Context graph queries are fundamentally different: typed GQL patterns with multi-hop traversal, temporal filtering, and policy-aware access control. Tools give full control over query shape.

Example context graph tool:

@tool

def query_decision_trace(decision_id: str) -> dict:

“””Retrieve the full decision trace: decision → policy → evidence → precedent.”””

query = “””

GRAPH MarketingGraph

MATCH (d:Decision {id: @id})-[:APPLIED_POLICY]->(p:Policy),

(d)-[:SUPPORTED_BY]->(e:Evidence),

(d)-[:PRECEDENT_FOR*0..2]->(prev:Decision)

WHERE d.valid_from <= CURRENT_TIMESTAMP()

AND (d.valid_to IS NULL OR d.valid_to > CURRENT_TIMESTAMP())

RETURN d, p, e, prev

“””

return spanner_client.execute(query, params={“id”: decision_id})

This is the query shape that MemoryService can’t express. The tool wraps it cleanly for ADK agents.

Limitations and Trade-Offs

No architecture post is complete without honest trade-offs. Here are the ones we’ve encountered:

Spanner Graph is GCP-only. If you’re multi-cloud or AWS-native, Neo4j or Amazon Neptune may be more practical, even at the cost of the unified SQL+GQL model. The architecture pattern (context graph + agent memory + sessions) is portable; the specific implementation is not.

Hindsight is young. The project launched in late 2025. Mem0’s ecosystem is more mature, with broader framework integrations and a larger community. If you need production stability today with minimal integration work, Mem0 is the safer bet. If you’re willing to invest in a newer system for the learning capabilities, Hindsight rewards that.

Context graphs require schema design upfront. Unlike agent memory systems that ingest unstructured facts, context graphs demand you model your decision domain explicitly. You need to define your node types, relationship types, temporal properties, and governance rules before your first query. This is an investment — it pays off in query precision and auditability, but it’s not zero-effort.Operational complexity. Three layers means three systems to monitor, version, and maintain. For smaller teams or simpler use cases, a two-layer approach (agent memory + sessions) may be sufficient. Add the context graph when your agents start making decisions that need to be auditable and explainable — not before.

Where We’re Applying This

At Lifesight, we use a variant of this architecture to power our marketing intelligence agents. The context graph stores the relationships between channels, campaigns, measurement models, business rules, and budget decisions — the full decision lifecycle.

When one of our agents recommends reallocating the budget from one channel to another, it doesn’t just surface a number. It traces the reasoning back through the causal measurement model, the business policy that governs spending changes, and the historical precedent that informed the recommendation. A marketing team can ask “why?” and get a structured, auditable answer — not a hallucinated explanation.

This is what we think separates agents that suggest from agents that are trusted. The difference isn’t the LLM. It’s the memory architecture behind it.

We’re sharing this architecture because we believe the pattern — context graphs for governed reasoning, agent memory for learning — is where the industry is heading. Not just for marketing, but for any domain where autonomous agents need to earn trust through explainability.

Conclusion

The AI agent memory problem is not one problem — it is three. Short-term conversational state, self-improving experiential learning, and governed structural decision intelligence each require purpose-built solutions.

The architecture presented here pairs Spanner Graph as the context graph brain with Hindsight as the self-improving memory personality, unified through Google’s ADK on Cloud Run. The entire system runs GCP-native.

If your enterprise AI roadmap includes autonomous or semi-autonomous agents, a context graph is the most practical path to scaling trust, auditability, and control — without crippling velocity. The goal is not to slow down AI. It’s to make AI fast and trustworthy, by giving it governed memory.

Resources

IN THIS ARTICLE

You may also like

  • The BFCM Trap: Waiting Until Q3 Kills Your Q4

    Published on: May 11, 2026

    The BFCM Trap: Waiting Until Q3 Kills Your Q4

    Start testing in Q2 or risk gambling your entire Q4 on unproven channels when costs are at their peak.

  • Creative Is Now a Measurable Growth Lever.

    Published on: April 18, 2026

    Creative Is Now a Measurable Growth Lever

    Creative now drives audience allocation and incremental revenue, not just messaging. Traditional metrics miss its true impact, making unified measurement essential.

  • Mia Blog Thumbnail

    Published on: March 18, 2026

    Introducing MIA, agents that turn measurement into action

    Marketing does not lack insights. It lacks decision velocity to act on them. MIA accelerates it.

Essential resources for your success

  • The Incremental ROAS Playbook for BFCM 2023

    The Incremental ROAS Playbook for BFCM 2023

    Dive into a playbook that revolutionizes your BFCM campaign approach. Crafted with meticulous precision, it...

  • Your Guide to Modern Measurement thumbnail

    Your Guide to Modern Measurement – the Causal Revolution

    Measure true marketing impact with incrementality, MMM, and causal analytics in a privacy-first world

  • Marketing Measurement

    Mastering the Four Pillars of Marketing Measurements

    Learn how each pillar plays a unique role in measuring marketing effectiveness and improving ROI across channels.