Appearance
Object-store file drops
A filedrop pipeline watches a bucket prefix and ingests every *.ndjson / *.jsonl / *.json file it sees. Each file's SHA256 is the dedupe key — re-uploading the same content is a no-op. It's the default path for batch ingestion: nightly exports, audit-log archives, vendor data dumps, AWS CloudTrail, anything that lands as files.
When to use
- Vendor exports that ship daily/hourly NDJSON
- AWS CloudTrail / VPC flow logs / GuardDuty (all NDJSON in S3)
- Backfills you want to be replayable — re-uploading is safe
- Anything that's already in object storage and you don't want to ETL out
Object-store requirements
Any S3-compatible store works:
- AWS S3
- MinIO (the Advanced engine runs against MinIO already)
- Cloudflare R2, DigitalOcean Spaces, Backblaze B2
- Self-hosted Ceph RGW, Garage, SeaweedFS, …
The runner only needs list and get on the configured prefix. No notification setup (S3 events / SNS / EventBridge) is required — it polls.
1. Create the pipeline
In Explore → Pipelines click Add pipeline, pick Object-store drop:
| Field | Example | Notes |
|---|---|---|
| Bucket | agentcy-ingest | Without s3:// prefix |
| Prefix | cloudtrail/ | Trailing / is conventional |
| Poll interval | 30 (seconds) | 5–3600 |
| Target table | context_events | Same choice as webhook |
| Realm | infrastructure | Records inherit this |
Or via API:
bash
curl -X POST https://your.agentcy.dev/api/v1/context/pipelines \
-H "authorization: Bearer $JWT" -H 'content-type: application/json' \
-d '{
"name": "cloudtrail-archive",
"kind": "filedrop",
"bucket": "agentcy-ingest",
"prefix": "cloudtrail/",
"poll_interval_secs": 60,
"target_table": "context_events",
"realm": "infrastructure"
}'2. Configure object-store credentials
The runner uses the worker's existing AWS / S3 credentials. Set them once in the deployment env (the same place KYMA_S3_* already lives on Advanced installs):
bash
# Common
S3_ENDPOINT=https://s3.amazonaws.com # or http://minio:9000
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIA…
S3_SECRET_ACCESS_KEY=…
# Optional: force path-style addressing for MinIO
S3_PATH_STYLE=true3. Drop a file
No pre-provisioning needed
The watcher creates the target table on first file with the default schema (at, label, body, props). Every new top-level key in your files becomes a Utf8 column on the next batch. Pre-existing tables keep their schema. Same headers as webhook (X-Auto-Create, X-Schema-Evolve) apply via the env vars KYMA_FILEDROP_AUTO_CREATE and KYMA_FILEDROP_SCHEMA_EVOLVE.
Multi-prefix on a single watcher
A single kyma instance watches multiple prefixes via KYMA_FILEDROP_PREFIXES=prefix1,prefix2,prefix3. One Agentcy pipeline maps to one prefix ({KYMA_FILEDROP_PREFIXES}/{database}/{table}/). The provisioner sets this up automatically on Cloud / on-prem Enterprise.
Upload an NDJSON file under the configured prefix:
bash
cat <<'EOF' > events.ndjson
{"label":"Event","name":"deploy_started","service":"checkout","at":"2026-04-29T11:00:00Z"}
{"label":"Event","name":"deploy_finished","service":"checkout","at":"2026-04-29T11:03:14Z"}
EOF
aws s3 cp events.ndjson s3://agentcy-ingest/cloudtrail/$(date +%s).ndjsonWithin poll_interval_secs the pipeline picks it up. Show runs in Explore reveals one new run per file, with source_ref set to s3://agentcy-ingest/cloudtrail/<filename> and the SHA256 prefix.
Re-upload the same content under a different filename → the run completes with rows_out = 0 and source_ref recording the dedupe hit.
4. Supported file formats
| Suffix | Parsed as |
|---|---|
.ndjson / .jsonl | One JSON object per line |
.json | A single JSON value. If it's an array, each element becomes a record; otherwise it's one record. |
| anything else | Skipped, with a warning in the run history |
GZIP (.ndjson.gz) is not yet decompressed by the Agentcy runner; rely on the engine's native filedrop on Advanced (kyma's kyma-ingest-filedrop accepts gzip directly) or pre-decompress.
5. Common shapes
AWS CloudTrail
CloudTrail writes one JSON file per delivery — the Records array becomes one record per AWS API call:
json
{ "Records": [ {"eventTime":"…", "eventName":"…", …}, … ] }Configure the prefix to your CloudTrail trail's bucket prefix (AWSLogs/<account-id>/CloudTrail/<region>/). Set target_table = "context_events" and let the records flow.
Heroku log drains
Heroku ships logs to S3 as gzipped JSONL. Pipe through a small gunzip + cp lambda or run on Advanced where kyma's filedrop reads gzip.
Custom batch exports
Easiest path: have your batch job write <job-id>.ndjson to the prefix at the end of each run. SHA256 dedupe means partial restarts are safe.
6. Observability
Each file becomes one run with:
source_ref:s3://bucket/key#sha256:<first-12>rows_in: lines parsedrows_out: rows accepted by the engine (after schema coercion)status:succeeded/failederror: parse / network / engine errors verbatim
Limits
- File size: no hard cap, but huge files block the worker — split when possible
- Concurrency: one file at a time per pipeline (fairness across pipelines)
- Latency:
poll_interval_secsis the worst case; budget accordingly for time-sensitive feeds (use a webhook instead)
Next
- HTTP Webhook for sub-second latency
- Telemetry pipelines for OTel data
- Concept: Context Engine & Realms for what realm + target_table buy you