Skip to content

Object-store file drops

A filedrop pipeline watches a bucket prefix and ingests every *.ndjson / *.jsonl / *.json file it sees. Each file's SHA256 is the dedupe key — re-uploading the same content is a no-op. It's the default path for batch ingestion: nightly exports, audit-log archives, vendor data dumps, AWS CloudTrail, anything that lands as files.

When to use

  • Vendor exports that ship daily/hourly NDJSON
  • AWS CloudTrail / VPC flow logs / GuardDuty (all NDJSON in S3)
  • Backfills you want to be replayable — re-uploading is safe
  • Anything that's already in object storage and you don't want to ETL out

Object-store requirements

Any S3-compatible store works:

  • AWS S3
  • MinIO (the Advanced engine runs against MinIO already)
  • Cloudflare R2, DigitalOcean Spaces, Backblaze B2
  • Self-hosted Ceph RGW, Garage, SeaweedFS, …

The runner only needs list and get on the configured prefix. No notification setup (S3 events / SNS / EventBridge) is required — it polls.

1. Create the pipeline

In Explore → Pipelines click Add pipeline, pick Object-store drop:

FieldExampleNotes
Bucketagentcy-ingestWithout s3:// prefix
Prefixcloudtrail/Trailing / is conventional
Poll interval30 (seconds)5–3600
Target tablecontext_eventsSame choice as webhook
RealminfrastructureRecords inherit this

Or via API:

bash
curl -X POST https://your.agentcy.dev/api/v1/context/pipelines \
  -H "authorization: Bearer $JWT" -H 'content-type: application/json' \
  -d '{
    "name": "cloudtrail-archive",
    "kind": "filedrop",
    "bucket": "agentcy-ingest",
    "prefix": "cloudtrail/",
    "poll_interval_secs": 60,
    "target_table": "context_events",
    "realm": "infrastructure"
  }'

2. Configure object-store credentials

The runner uses the worker's existing AWS / S3 credentials. Set them once in the deployment env (the same place KYMA_S3_* already lives on Advanced installs):

bash
# Common
S3_ENDPOINT=https://s3.amazonaws.com   # or http://minio:9000
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=AKIA…
S3_SECRET_ACCESS_KEY=

# Optional: force path-style addressing for MinIO
S3_PATH_STYLE=true

3. Drop a file

No pre-provisioning needed

The watcher creates the target table on first file with the default schema (at, label, body, props). Every new top-level key in your files becomes a Utf8 column on the next batch. Pre-existing tables keep their schema. Same headers as webhook (X-Auto-Create, X-Schema-Evolve) apply via the env vars KYMA_FILEDROP_AUTO_CREATE and KYMA_FILEDROP_SCHEMA_EVOLVE.

Multi-prefix on a single watcher

A single kyma instance watches multiple prefixes via KYMA_FILEDROP_PREFIXES=prefix1,prefix2,prefix3. One Agentcy pipeline maps to one prefix ({KYMA_FILEDROP_PREFIXES}/{database}/{table}/). The provisioner sets this up automatically on Cloud / on-prem Enterprise.

Upload an NDJSON file under the configured prefix:

bash
cat <<'EOF' > events.ndjson
{"label":"Event","name":"deploy_started","service":"checkout","at":"2026-04-29T11:00:00Z"}
{"label":"Event","name":"deploy_finished","service":"checkout","at":"2026-04-29T11:03:14Z"}
EOF

aws s3 cp events.ndjson s3://agentcy-ingest/cloudtrail/$(date +%s).ndjson

Within poll_interval_secs the pipeline picks it up. Show runs in Explore reveals one new run per file, with source_ref set to s3://agentcy-ingest/cloudtrail/<filename> and the SHA256 prefix.

Re-upload the same content under a different filename → the run completes with rows_out = 0 and source_ref recording the dedupe hit.

4. Supported file formats

SuffixParsed as
.ndjson / .jsonlOne JSON object per line
.jsonA single JSON value. If it's an array, each element becomes a record; otherwise it's one record.
anything elseSkipped, with a warning in the run history

GZIP (.ndjson.gz) is not yet decompressed by the Agentcy runner; rely on the engine's native filedrop on Advanced (kyma's kyma-ingest-filedrop accepts gzip directly) or pre-decompress.

5. Common shapes

AWS CloudTrail

CloudTrail writes one JSON file per delivery — the Records array becomes one record per AWS API call:

json
{ "Records": [ {"eventTime":"…", "eventName":"…", },  ] }

Configure the prefix to your CloudTrail trail's bucket prefix (AWSLogs/<account-id>/CloudTrail/<region>/). Set target_table = "context_events" and let the records flow.

Heroku log drains

Heroku ships logs to S3 as gzipped JSONL. Pipe through a small gunzip + cp lambda or run on Advanced where kyma's filedrop reads gzip.

Custom batch exports

Easiest path: have your batch job write <job-id>.ndjson to the prefix at the end of each run. SHA256 dedupe means partial restarts are safe.

6. Observability

Each file becomes one run with:

  • source_ref: s3://bucket/key#sha256:<first-12>
  • rows_in: lines parsed
  • rows_out: rows accepted by the engine (after schema coercion)
  • status: succeeded / failed
  • error: parse / network / engine errors verbatim

Limits

  • File size: no hard cap, but huge files block the worker — split when possible
  • Concurrency: one file at a time per pipeline (fairness across pipelines)
  • Latency: poll_interval_secs is the worst case; budget accordingly for time-sensitive feeds (use a webhook instead)

Next

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.