RAG & Semantic Search

agentcy-rag is Agentcy's retrieval layer. It embeds every graph node into a vector, stores the vectors in Neo4j 5's HNSW indexes, and exposes semantic search. Embeddings are generated locally via fastembed-rs — no external API calls.

This page is the operator/integrator view. For the agent-facing side, see Agent Loop.

Query the graph semantically

bash

curl -X POST http://localhost:8080/api/v1/rag/search \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{
    "query":   "why is the payment service flaky",
    "top_k":   10,
    "realms":  ["infrastructure","development"]
  }'

Response:

json

{
  "hits": [
    {
      "node_id":"n_01HABC",
      "labels":["Service"],
      "score":0.874,
      "properties":{ "name":"payments", "runtime":"go-1.22" },
      "source":{ "connector":"kubernetes", "external_id":"…" },
      "realm":"infrastructure"
    },
    {
      "node_id":"n_01HAB9",
      "labels":["PullRequest"],
      "score":0.811,
      "properties":{ "number":412, "title":"revert payments DB_POOL_SIZE" },
      "realm":"development"
    }
    …
  ],
  "took_ms": 38
}

Re-rank with filters

Add filters to narrow before scoring:

json

{
  "query": "flaky payment",
  "filters": {
    "labels":   ["Service","PullRequest","Incident"],
    "after":    "2026-04-20T00:00:00Z",
    "exclude":  { "archived": true }
  },
  "top_k": 20
}

Re-rank order is: vector similarity → BM25 over name+title+description → recency boost.

Hybrid search (vector + full-text)

bash

curl -X POST http://…/rag/hybrid \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{
    "query": "checkout 500 error",
    "text_weight": 0.4,
    "vector_weight": 0.6,
    "top_k": 20
  }'

Useful when the query has strong keywords (error codes, IDs) that pure vectors might miss.

What gets embedded

agentcy-rag embeds the canonical text representation of a node:

<labels> | <name> | <description or title> | <key props as k=v pairs>

Re-embedding happens on every property change to the text fields (name, title, description) and on label changes. Other property changes skip re-embed to save compute.

Manually trigger re-embedding of a label or realm (admin-only):

bash

curl -X POST http://…/rag/reindex \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{"labels":["Service"],"realm":"infrastructure"}'

Model choice

Default embedding model is bge-small-en (384 dims). Alternatives:

env

RAG_EMBED_MODEL=bge-base-en        # 768 dims, slower, better recall
RAG_EMBED_MODEL=nomic-embed-text   # 768 dims, tuned for retrieval

Changing the model requires a full reindex (you can't mix dimensions). The API rejects mixed-dim writes with a clear error.

Performance

Benchmarks on a single API process (M2 Pro, 1M nodes):

Op	p50	p99
embed 1 query	18 ms	45 ms
HNSW search, top_k=10	6 ms	20 ms
hybrid search, top_k=10	22 ms	70 ms
full reindex 1M nodes	—	~4 min

Batch embedding for pipeline writes uses fastembed batching (default 64).

Agent integration

The agent loop uses /rag/search implicitly when the LLM calls search_connector_tools — it's the same vector path under the hood, just restricted to ToolSpec rows. Agents can also call /rag/search directly via the rag.search tool when features.rag and the appropriate connector expose it (via MCP or a custom skill).

A common pattern: before calling a tool, the agent rag.search's for related incidents or PRs to inject context into its own plan.

Gotchas

Embedding model changes = reindex. There's no live migration between dims.
Very short nodes (label + id) don't embed well. Enrich them — add description via a pipeline step.
HNSW is approximate. For high-stakes queries, bump search_ef (env RAG_HNSW_SEARCH_EF, default 64) or use hybrid search.
Embeddings stay in Neo4j. Back up Neo4j, not a separate vector store. There's no external Pinecone/Weaviate.

Concept: Knowledge Graph & Realms — where hits come from.
Concept: Memory System — the other retrieval surface.
Reference: REST API — /rag routes.

RAG & Semantic Search ​

Query the graph semantically ​

Re-rank with filters ​

Hybrid search (vector + full-text) ​

What gets embedded ​

Model choice ​

Performance ​

Agent integration ​

Gotchas ​

Next ​

RAG & Semantic Search

Query the graph semantically

Re-rank with filters

Hybrid search (vector + full-text)

What gets embedded

Model choice

Performance

Agent integration

Gotchas

Next