Object-store file drops

A filedrop pipeline watches a bucket prefix and ingests every *.ndjson / *.jsonl / *.json file it sees. Each file's SHA256 is the dedupe key — re-uploading the same content is a no-op. It's the default path for batch ingestion: nightly exports, audit-log archives, vendor data dumps, AWS CloudTrail, anything that lands as files.

When to use

Vendor exports that ship daily/hourly NDJSON
AWS CloudTrail / VPC flow logs / GuardDuty (all NDJSON in S3)
Backfills you want to be replayable — re-uploading is safe
Anything that's already in object storage and you don't want to ETL out

Object-store requirements

Any S3-compatible store works:

AWS S3
MinIO (the Advanced engine runs against MinIO already)
Cloudflare R2, DigitalOcean Spaces, Backblaze B2
Self-hosted Ceph RGW, Garage, SeaweedFS, …

The runner only needs list and get on the configured prefix. No notification setup (S3 events / SNS / EventBridge) is required — it polls.

1. Create the pipeline

In Explore → Pipelines click Add pipeline, pick Object-store drop:

Field	Example	Notes
Bucket	`agentcy-ingest`	Without `s3://` prefix
Prefix	`cloudtrail/`	Trailing `/` is conventional
Poll interval	`30` (seconds)	5–3600
Target table	`context_events`	Same choice as webhook
Realm	`infrastructure`	Records inherit this

Or via API:

bash

curl -X POST https://your.agentcy.dev/api/v1/context/pipelines \
  -H "authorization: Bearer $JWT" -H 'content-type: application/json' \
  -d '{
    "name": "cloudtrail-archive",
    "kind": "filedrop",
    "bucket": "agentcy-ingest",
    "prefix": "cloudtrail/",
    "poll_interval_secs": 60,
    "target_table": "context_events",
    "realm": "infrastructure"
  }'

2. Configure object-store credentials

The runner uses the worker's existing AWS / S3 credentials. Set them once in the deployment env (the same place KYMA_S3_* already lives on Advanced installs):

bash

# Common
S3_ENDPOINT=https://s3.amazonaws.com   # or http://minio:9000
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIA…
S3_SECRET_ACCESS_KEY=…

# Optional: force path-style addressing for MinIO
S3_PATH_STYLE=true

3. Drop a file

No pre-provisioning needed

The watcher creates the target table on first file with the default schema (at, label, body, props). Every new top-level key in your files becomes a Utf8 column on the next batch. Pre-existing tables keep their schema. Same headers as webhook (X-Auto-Create, X-Schema-Evolve) apply via the env vars KYMA_FILEDROP_AUTO_CREATE and KYMA_FILEDROP_SCHEMA_EVOLVE.

Multi-prefix on a single watcher

A single kyma instance watches multiple prefixes via KYMA_FILEDROP_PREFIXES=prefix1,prefix2,prefix3. One Agentcy pipeline maps to one prefix ({KYMA_FILEDROP_PREFIXES}/{database}/{table}/). The provisioner sets this up automatically on Cloud / on-prem Enterprise.

Upload an NDJSON file under the configured prefix:

bash

cat <<'EOF' > events.ndjson
{"label":"Event","name":"deploy_started","service":"checkout","at":"2026-04-29T11:00:00Z"}
{"label":"Event","name":"deploy_finished","service":"checkout","at":"2026-04-29T11:03:14Z"}
EOF

aws s3 cp events.ndjson s3://agentcy-ingest/cloudtrail/$(date +%s).ndjson

Within poll_interval_secs the pipeline picks it up. Show runs in Explore reveals one new run per file, with source_ref set to s3://agentcy-ingest/cloudtrail/<filename> and the SHA256 prefix.

Re-upload the same content under a different filename → the run completes with rows_out = 0 and source_ref recording the dedupe hit.

4. Supported file formats

Suffix	Parsed as
`.ndjson` / `.jsonl`	One JSON object per line
`.json`	A single JSON value. If it's an array, each element becomes a record; otherwise it's one record.
anything else	Skipped, with a warning in the run history

GZIP (.ndjson.gz) is not yet decompressed by the Agentcy runner; rely on the engine's native filedrop on Advanced (kyma's kyma-ingest-filedrop accepts gzip directly) or pre-decompress.

5. Common shapes

AWS CloudTrail

CloudTrail writes one JSON file per delivery — the Records array becomes one record per AWS API call:

json

{ "Records": [ {"eventTime":"…", "eventName":"…", …}, … ] }

Configure the prefix to your CloudTrail trail's bucket prefix (AWSLogs/<account-id>/CloudTrail/<region>/). Set target_table = "context_events" and let the records flow.

Heroku log drains

Heroku ships logs to S3 as gzipped JSONL. Pipe through a small gunzip + cp lambda or run on Advanced where kyma's filedrop reads gzip.

Custom batch exports

Easiest path: have your batch job write <job-id>.ndjson to the prefix at the end of each run. SHA256 dedupe means partial restarts are safe.

6. Observability

Each file becomes one run with:

source_ref: s3://bucket/key#sha256:<first-12>
rows_in: lines parsed
rows_out: rows accepted by the engine (after schema coercion)
status: succeeded / failed
error: parse / network / engine errors verbatim

Limits

File size: no hard cap, but huge files block the worker — split when possible
Concurrency: one file at a time per pipeline (fairness across pipelines)
Latency: poll_interval_secs is the worst case; budget accordingly for time-sensitive feeds (use a webhook instead)

HTTP Webhook for sub-second latency
Telemetry pipelines for OTel data
Concept: Context Engine & Realms for what realm + target_table buy you

Object-store file drops ​

When to use ​

Object-store requirements ​

1. Create the pipeline ​

2. Configure object-store credentials ​

3. Drop a file ​

4. Supported file formats ​

5. Common shapes ​

AWS CloudTrail ​

Heroku log drains ​

Custom batch exports ​

6. Observability ​

Limits ​

Next ​

Object-store file drops

When to use

Object-store requirements

1. Create the pipeline

2. Configure object-store credentials

3. Drop a file

4. Supported file formats

5. Common shapes

AWS CloudTrail

Heroku log drains

Custom batch exports

6. Observability

Limits

Next