Building a persistent RAG memory system for Claude Code with remote access.

The Problem

Claude Code sessions are ephemeral. Every time you start a new conversation, context from previous sessions is lost. File-based memory (markdown files in ~/.claude/) helps, but it's limited — flat text with no semantic search, no cross-machine access, and no way to query by relevance. When you're working across multiple machines (a development server and a local PC), the gap becomes even wider.

We wanted three things:

Semantic memory — store facts, decisions, and project context with vector embeddings so they can be retrieved by meaning, not just keywords
Persistence — memories survive across sessions, reboots, and machine changes
Remote access — the same memory system accessible from any machine running Claude Code

Architecture Overview

Claude Code (any machine)
    ↓ MCP (stdio local / Streamable HTTP remote)
claude-memory server (Node.js + TypeScript)
    ↓ SQL + vector search
Supabase (PostgreSQL + pgvector)
    ↓ embeddings
Voyage AI (voyage-3 model)

claude-memory is an MCP (Model Context Protocol) server that exposes 8 tools to Claude Code:

store_memory — save a memory with type, importance, tags, and auto-generated embeddings
search_memory — semantic vector search with optional filters (type, tags, importance, recency bias)
list_memories — browse by recency, importance, or access frequency
forget_memory — soft-delete outdated memories
summarize_session — persist a session summary
save_session_link — link a session UUID to a label for later retrieval
find_session — look up a past session by label
memory_stats — dashboard of memory system health

The Database Layer

We used the existing Supabase instance (already running for AgentCRM) and added a claude_memory table with pgvector support:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS claude_memory (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    memory_type     TEXT NOT NULL,
    content         TEXT NOT NULL,
    title           TEXT,
    tags            TEXT[] DEFAULT '{}',
    source          TEXT,
    session_id      TEXT,
    project         TEXT,
    importance      INTEGER DEFAULT 5,
    embedding       vector(1024),
    -- ... timestamps, access tracking, expiry
);

The embedding column stores 1024-dimensional vectors generated by Voyage AI's voyage-3 model. When a memory is stored, the content is sent to Voyage AI to generate an embedding. When a search is performed, the query is embedded the same way and compared using cosine similarity against all stored memories.

Memory types include: fact, decision, code_pattern, project_context, user_preference, session_summary, debug_insight, architecture, and tool_usage. Each memory has an importance score (1–10) that can be used for filtering and ranking.

The MCP Server

The server (~/claude-memory/src/) is built with:

@modelcontextprotocol/sdk — MCP server framework
@supabase/supabase-js — database client
Express — HTTP server for remote access
Zod — config validation
Voyage AI — embedding generation

Dual Transport

The key design decision was supporting two transports:

Stdio — for local Claude Code on the same machine. The server runs as a child process, communicating over stdin/stdout. Zero network overhead, no auth needed.

{
  "claude-memory": {
    "command": "node",
    "args": ["/home/rdpuser/claude-memory/dist/index.js", "--stdio"]
  }
}

Streamable HTTP — for remote Claude Code instances. The server listens on port 3002 with a single /mcp endpoint that handles all MCP traffic. Protected by bearer token authentication.

// Single endpoint handles POST (new/existing sessions), GET (SSE notifications), DELETE (cleanup)
app.all('/mcp', bearerAuth, async (req, res) => {
  if (req.method === 'POST') {
    const sessionId = req.headers['mcp-session-id'];
    if (sessionId && sessions.has(sessionId)) {
// Existing session
await sessions.get(sessionId).transport.handleRequest(req, res);
    } else {
// New session
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: () => crypto.randomUUID() });
const server = createServer();
await server.connect(transport);
await transport.handleRequest(req, res);
sessions.set(transport.sessionId, { server, transport });
    }
  }
});

We initially used SSE transport (separate /sse and /messages endpoints), but Cloudflare's proxy killed the long-lived SSE connection before POST messages arrived, causing 400 errors. Streamable HTTP solved this by using standard request/response cycles on a single endpoint.

The transport is selected by a command-line flag: --stdio for local, omit for HTTP.

Exposing to the Internet

The Challenge

The server's public IP is actually a hosting provider proxy that only forwards ports 80 and 443. The real machine IP isn't directly reachable. Opening port 3002 in ufw was necessary but insufficient — the provider's proxy doesn't forward non-standard ports.

This also meant Let's Encrypt ACME challenges failed — both TLS-ALPN-01 and HTTP-01 challenges were intercepted by the proxy, returning errors instead of reaching Caddy.

The Solution

We used the same pattern already working for AgentCRM:

Registered gitdiot.com and added it to Cloudflare
Created an A record for fragrag.gitdiot.com pointing to the server IP, with Cloudflare proxy (orange cloud) enabled
Configured Caddy with tls internal — Cloudflare handles public TLS, Caddy uses a self-signed cert for the Cloudflare-to-origin connection

fragrag.gitdiot.com {
    tls internal
    reverse_proxy localhost:3002
    log {
  output file /var/log/caddy/fragrag.log
    }
}

4. Set Cloudflare SSL mode to "Full" — trusts the origin's self-signed cert

The traffic flow: Client → Cloudflare (TLS) → Caddy:443 (internal TLS) → Node:3002

Persistence with systemd

To survive reboots, we created a systemd service:

[Unit]
Description=Claude Memory RAG MCP Server
After=network.target

[Service]
Type=simple
User=rdpuser
WorkingDirectory=/home/rdpuser/claude-memory
ExecStart=/home/rdpuser/.nvm/versions/node/v22.22.1/bin/node dist/index.js
Restart=on-failure
RestartSec=5
EnvironmentFile=/home/rdpuser/claude-memory/.env

[Install]
WantedBy=multi-user.target

The EnvironmentFile directive loads Supabase credentials, the Voyage AI key, and the bearer token from .env without hardcoding them in the service file.

Remote Client Configuration

MCP servers in Claude Code are registered via the CLI, not through settings.json. We learned this the hard way — adding mcpServers to the settings file had no effect. The correct approach:

claude mcp add claude-memory https://fragrag.gitdiot.com/mcp -t http -s user \
  -H "Authorization:Bearer cm-..."

Key flags:

-t http — Streamable HTTP transport (not sse)
-s user — stores in ~/.claude.json, persists across projects
-H — bearer token for auth

One gotcha: the CLI sometimes inserts a newline in the Authorization header value. If claude mcp list shows "Failed to connect", check ~/.claude.json and ensure the header is a single line.

The portable config repo (gitDiot-Org/claude-cli-config) contains settings, skills, and memories deployed via git pull && ./install.sh.

Project Structure

claude-memory/
├─ .env                    # secrets (gitignored)
├─ .gitignore
├─ package.json
├─ tsconfig.json
├─ sql/
│   └─ 001_claude_memory.sql   # Supabase migration
├─ scripts/
│   └─ migrate.ts              # Migration runner
└─ src/
    ├─ index.ts                # Dual-transport MCP server
    ├─ config.ts               # Zod-validated env config
    ├─ auth.ts                 # Bearer token middleware
    ├─ embeddings.ts           # Voyage AI integration
    └─ tools/
  ├─ memory.ts           # store, search, list, forget, stats
  └─ sessions.ts         # summarize, save_link, find

How It Works in Practice

At the start of a session, Claude Code can call search_memory with the current task context to load relevant prior knowledge:

search_memory("CRM bot architecture")
→ AgentCRM Architecture (score: 0.710) — full architecture details

During work, important discoveries get stored:

store_memory("SSH key works with gitDiot-Org", type="fact", importance=6)

The vector search returns results ranked by semantic similarity, not keyword matching. Searching for "how to authenticate with GitHub" returns the SSH key fact (score: 0.598) even though the memory never mentions "authenticate."

Stale memories can be soft-deleted and replaced with updated versions, keeping the knowledge base current.

Security Considerations

Bearer token auth on the /mcp endpoint
Health check is the only unauthenticated endpoint
Secrets (Supabase keys, Voyage AI key, bearer token) stored in .env, excluded from git
The bearer token is stored in the portable config repo, which is private — acceptable tradeoff for a personal tool
Cloudflare proxy hides the origin server IP and provides DDoS protection
TLS end-to-end — Cloudflare to client, internal cert from Cloudflare to origin

Bonus: Two Claudes, One Git Repo

The most unexpected outcome was using the git repos as a communication channel between two Claude Code instances — one on the remote server, one on the local PC.

Both instances poll the repos every 60 seconds using cron jobs. When one pushes a commit, the other pulls it, reads the changes, and acts on them. This created a feedback loop:

Desktop Claude diagnosed the SSE transport failure and identified that express.json() middleware was breaking StreamableHTTPServerTransport
Desktop Claude pushed the fix to gitDiot-Org/fragRag
Server Claude's polling loop detected the new commit within a minute
Server Claude pulled, rebuilt (npx tsc), restarted the systemd service, verified the health check, and pushed a status commit back
Desktop Claude saw the confirmation and tested the connection

Two AI agents collaborating asynchronously through version control, each with access to different parts of the infrastructure. The server Claude could restart services and check logs; the desktop Claude could test the client connection and iterate on fixes. Git provided the audit trail.

Desktop Claude                    Server Claude
     │                                  │
     ├── push fix ──────────────────────>│
     │                                  ├── pull
     │                                  ├── rebuild
     │                                  ├── restart service
     │                                  ├── verify health
     │<─────────────────── push status ──┤
     ├── test connection                │
     ├── confirm working                │
     │                                  │

What We Built

A personal knowledge graph for an AI coding assistant. Memories persist across sessions and machines. The semantic search means you don't need to remember exact keywords — describe what you're looking for and the vector similarity finds it. The dual-transport architecture means zero overhead when working locally, with full remote access when needed.

Total infrastructure: one Supabase table, one Node.js process, one Caddy route, one Cloudflare DNS record, one systemd service, one cron poll. No Kubernetes, no Lambda, no orchestration layer. Simple enough to debug with curl and journalctl — and apparently, simple enough for two AI agents to debug collaboratively through git commits.

Related: fragRag: Building Persistent Memory for Claude Code Across Machines — the journey narrative of what failed and what we learned. And Debugging Across Machines — how two Claude instances collaborated through Git to fix this server.