Skip to content

Pipelines: How to choose a source

A pipeline in Agentcy is a long-lived configuration that streams records into the Context Engine. Every pipeline has a source (where records come from) and a target table (context_nodes, context_edges, or context_events). Records are written through the same engine the rest of the platform reads from, so they show up in the Explore page, in agent tool calls, and in structured queries the moment they land.

This page is the decision guide. The deep dives are linked at the bottom of each section.

Pick a source

You want to …Pick thisLatencySetup
POST events from your service or a log shipper (Vector, Fluent Bit, Datadog Agent's HTTP output, Loki Promtail, your own backend)HTTP Webhooksecondsissue a token, copy the URL
Drop newline-delimited JSON files into a bucket and have them ingested automaticallyFile Dropminutes (poll interval)configure bucket + prefix
Already have a connector configured (GitHub, AWS, Slack, …) and want its scheduled sync to flow through the pipelines runner so its runs are visible in ExploreConnector sourceper connector schedulereference the connector ID
Send OpenTelemetry signals (logs / metrics / traces)Telemetry — coming soonsecondsOTLP endpoint per pipeline
Stream from Kafka topicsTelemetry — coming soonsecondstopic → pipeline binding

If your data source isn't listed, HTTP Webhook is almost always the right answer. It accepts NDJSON over plain POST and works with every log shipper, application framework, and bash script ever written.

Anatomy of a pipeline

┌─────────────────┐     POST/sync     ┌──────────────┐    ingest_records    ┌────────────────┐
│   Your source   │ ─────────────────►│   Pipeline   │ ────────────────────►│ Context Engine │
│ (app / file /   │                   │   (runner)   │                      │  (Basic /      │
│  connector)     │                   │              │                      │   Advanced)    │
└─────────────────┘                   └──────┬───────┘                      └────────────────┘


                                       run history,
                                       row counts,
                                       errors

Every record pipelines write inherits the pipeline's realm (data namespace) and target_table (which physical table it lands in). Pipeline runs are recorded with rows_in, rows_out, status, and error so you can see end-to-end ingestion health on the Pipelines tab in Explore.

What lives where

  • Frontend — Explore → Pipelines tab (/explore?tab=pipelines)
  • REST API/api/v1/context/pipelines/*
  • Public ingestPOST /api/v1/context/ingest/webhook/{token} (no JWT — token is the auth)
  • Crateagentcy-pipelines (config + repo + runner + filedrop watcher)
  • Migration052_context_pipelines.sql

Two engines, same surface

Pipelines work on both the Basic and Advanced Context Engines. The runner calls ContextEngine::ingest_records() which the active provider implements:

  • Basic — translates each NDJSON record to a graph node/edge/event and writes through the embedded graph store.
  • Advanced — POSTs the batch to kyma's /v1/ingest (NDJSON, group-commit + idempotency ledger). Inherits kyma's exact-once semantics for free.

The pipeline's wire shape and UX are identical on both. Where realm and dedupe are enforced differs, but you don't have to think about it.

Roadmap

SourceStatus
HTTP Webhook✅ Shipped
Object-store filedrop✅ Shipped
Connector wrap✅ Shipped
OTLP gRPC (logs)✅ Native — every Cloud / Advanced instance exposes a TCP-proxied OTLP endpoint. See the telemetry guide.
OTLP gRPC (metrics, traces)🟡 On the kyma roadmap — receiver work-in-progress upstream.
Kafka consumer groups✅ Native (env-configured topic mapping per instance).
Per-pipeline OTLP routing🟡 Workarounds today; native x-pipeline-token header on the kyma roadmap.
Server-Sent Events / WebSocket push🔴 Considered

Next

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.