Skip to content

Automated Code Review

02DevOpsAutomated code review
Context Graph tribal knowledge
codebase conventions · prior reviews · CODEOWNERS
Sources
GitHub
GitHub
CIAB
CIAB
PR opened
Agentcy
Agentcy
Agentcy
CIAB · Claude Code sandbox
inline review
Output
PR + Slack
PR + Slack

At a glance

The agent's job: when a GitHub PR is opened or updated, spawn a CIAB sandbox running Claude Code, give it the diff and the codebase, ask for an architecture-aware review, and post the summary back to the PR plus a heads-up to Slack. Human approval gates any destructive tool.

Stack

  • GitHub GitHub — for the PR webhook + the diff + posting comments.
  • Agentcy CIAB (Claude Code) — the sandboxed coding-agent runtime that does the actual review.
  • Slack Slack — heads-up channel for "PR #X reviewed".
  • Webhooks & Triggers — GitHub PR webhook fires the task.
  • Approval Flows — protects writes (PR comments) until you trust the review quality.
  • Agent Loop — the LLM uses CIAB and GitHub tools.

Cursor / Codex deferred

The CIAB runtime accepts agent_provider: "cursor" or "codex" as a config flag, but we don't yet ship a packaged sandbox image with those CLIs installed. This recipe uses Claude Code only until those runtimes are packaged.

What you'll build

The agent gets up to N (default 20) tool-call iterations to investigate. Tools are policy-gated; writes hit an approval gate if your policy requires it.

  1. GitHub PR webhook → Agentcy task fires.
  2. Agent calls github.get_pull_request and github.get_diff.
  3. Agent calls ciab.start_session with agent_provider="claude-code".
  4. Inside the sandbox: shell commands like rg, cargo check, npm test.
  5. Agent composes a markdown review.
  6. Agent calls github.create_pr_review (gated by approval until trusted).
  7. Agent posts a 3-line summary to Slack #code-reviews.

Prerequisites

  • GitHub configured as a connector with at least repo read and PR write scopes (PAT) or as a GitHub App with pull_requests: write and contents: read. App is preferred for orgs.
  • CIAB enabled — AGENTCY_FEATURES_CIAB=true. See CIAB overview.
    • For dev: local runtime (CIAB_RUNTIME=local) is fine.
    • For prod multi-tenant: EC2 runtime (CIAB_RUNTIME=ec2). See EC2 setup.
  • Slack configured (see Slack channel how-to).
  • A frontier-class LLM. Code review benefits a lot from Claude Sonnet 4.6 or GPT-5.
  • The CIAB sandbox image must have Claude Code installed. The default image AgentcyLabs ships includes it.

Step-by-step

1. Configure GitHub with PR-write permission

text
1. Open /connectors → click "+ Add Connector".
2. Pick GitHub → choose "GitHub App" if your org has one (preferred),
   else "Personal Access Token".
3. For PAT: paste a token with repo scope.
   For App: install on your org and select repos.
4. Realm: development. Click Save.
5. Click "Test" — green pill = ready.
bash
# PAT — fastest for personal use
curl -X POST http://localhost:8080/api/v1/sources \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{
    "name":"github-primary",
    "connector":"github",
    "realm":"development",
    "config":{"auth":{"kind":"pat","token":"ghp_..."},"orgs":["acme"]}
  }'

For an organization, a GitHub App is much safer. See GitHub connector.

2. Enable CIAB and pick a runtime

Coding Agents view with active sessions.

text
1. Open /coding-agents.
2. Click "Settings" → toggle "Enable CIAB".
3. Pick runtime: Local (dev) or EC2 (prod).
4. For EC2: paste AMI id, instance type, IAM role.
5. Save. Click "Spawn test session" → run `claude-code --version` to
   verify the CLI is present in the sandbox image.
bash
# In your Agentcy backend env
AGENTCY_FEATURES_CIAB=true
CIAB_RUNTIME=local           # or ec2 for production
CIAB_MAX_SESSIONS_PER_ORG=10
CIAB_SESSION_IDLE_TIMEOUT=600

If claude-code isn't found in the sandbox, your sandbox image is missing the CLI; see local runtime troubleshooting.

3. Configure Slack

See Slack channel how-to. The bot needs chat:write minimum.

4. Create the code-review task

text
1. Open /tasks → click "+ New Task".
2. Trigger Type: Webhook.
3. Name: code-review-agent.
4. Task Prompt: paste the instruction from the API tab.
5. Connectors: tick github-primary, ciab, slack.
6. Approval defaults: write = approve.
7. Cost cap (USD/day): 25.00.
8. Max concurrent runs: 5.
9. Save. Copy the Webhook URL and Secret shown after save.
bash
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{
    "name":"code-review-agent",
    "agent":"default",
    "realm":"development",
    "trigger":{"kind":"webhook"},
    "input_template":{
      "instruction":"A GitHub PR event arrived: {{trigger.body}}. If the action is not opened, reopened, or synchronize, do nothing. Otherwise: open a CIAB session for the head branch using agent_provider=claude-code. In the session, examine the diff for: (1) correctness bugs, (2) missing tests, (3) regressions in established patterns (search the codebase for similar code), (4) security concerns. Run targeted commands like `rg`, `cargo check`, or `npm test` if useful. Produce a review summary in markdown: a one-paragraph TL;DR, then a bulleted list of findings grouped by severity. Post the full review as a GitHub PR comment, and post a short version (3 lines, link to PR) to Slack #code-reviews. If the diff is over 1000 lines, do not attempt the review — instead post a comment asking the author to break it up.",
      "max_diff_lines": 1000
    },
    "approval_defaults":{"write":"approve"},
    "cost_cap_usd_per_day": 25.00,
    "max_concurrent_runs": 5
  }'

5. Configure the GitHub webhook

text
1. In the repo (or org-level for repo-wide coverage):
   Settings → Webhooks → Add webhook.
2. Payload URL: the URL from step 4.
3. Content type: application/json.
4. Secret: the secret from step 4.
5. Events: tick only "Pull requests".
6. Save. GitHub sends a test ping immediately — check Agentcy
   /tasks/$TASK_ID for the run.
text
For org-level (single webhook covers every repo):
1. Org → Settings → Webhooks → Add.
2. Same Payload URL, Secret, content type.
3. Active = on. Events: Pull requests.

GitHub signs webhooks with X-Hub-Signature-256. Agentcy's receiver auto-detects GitHub's signing scheme for tasks named connector: github or webhook samples named github_*.

6. Test with a sample payload

text
1. Open /tasks → click code-review-agent.
2. Click "Test with sample payload".
3. Pick "github-pull-request-opened".
4. Click Run. Watch the run pane for tool calls.
5. When github.create_pr_review fires, an "Approval required" card
   appears. Click Approve.
6. Run finalizes; in History, click the run for the full transcript.
bash
curl -X POST http://localhost:8080/api/v1/webhook-samples/github-pull-request-opened/run \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{"task_id":"task_..."}'

# Stream the run
curl -N "http://localhost:8080/api/v1/tasks/$TASK_ID/runs/$RUN_ID/events" \
  -H "authorization: Bearer $TOKEN"

# Approve from another terminal when prompted
curl -X POST "http://localhost:8080/api/v1/chat/conversations/$CONV/approvals/$APPROVAL_ID" \
  -H "authorization: Bearer $TOKEN" -H 'content-type: application/json' \
  -d '{"approved":true}'

Once you trust the agent's behavior on real PRs, swap to the policy-driven approval below (no manual clicks per PR).

Worked example

Production-ready policy that allows Slack posts and PR comments by this task without approval, but still gates everything else:

rego
package agentcy.code_review

# Allow PR comments and Slack posts ONLY from this task. Other writes still need approval.
allow {
  input.subject.task == "code-review-agent"
  input.resource.connector == "github"
  input.resource.tool == "github.create_pr_review"
}

allow {
  input.subject.task == "code-review-agent"
  input.resource.connector == "slack"
  input.resource.tool == "slack.post_message"
}

# Block CIAB write ops by default — agents shouldn't be pushing branches
deny[msg] {
  input.subject.task == "code-review-agent"
  input.resource.connector == "ciab"
  input.resource.tool in {"ciab.write_file", "ciab.git_push"}
  msg := "code-review agent is read-only inside the sandbox"
}

Set the task's approval_defaults.write = "policy" so it consults the policy instead of asking humans for everything.

What good looks like

For a real PR, the agent posts a markdown review on the PR — a TL;DR plus findings grouped by severity. Here's the visual shape:

Example PR review comment posted by Agentcy with TL;DR + findings.

And a short heads-up in Slack:

📝 Reviewed acme/monolith#418 — looks good with 1 correctness fix needed before merge. Author: @bob

The full transcript (all tool calls, the CIAB shell output, the LLM's reasoning) is at GET /api/v1/tasks/$TASK_ID/runs/$RUN_ID.

Variations

  • Self-host with no Slack. Drop the slack.post_message from the instruction; the PR comment is the canonical output.
  • Multiple repos with different rules. Use a match_rule on the trigger or fork the task per repo (code-review-monolith, code-review-frontend, …). Don't try to make one task handle every repo.
  • Block PRs that fail the review. Add a step "if you found a correctness issue, request changes via github.create_pr_review with event: REQUEST_CHANGES". Combined with branch protection, this turns the agent into a soft gate.
  • Review only sensitive paths. In the instruction, "only review changes under crates/agentcy-auth/, crates/agentcy-policy/, or migrations/". Faster, cheaper, focuses attention.
  • Local-only mode for a single dev. CIAB_RUNTIME=local, no Slack, post comments via PAT. Makes a great CI for personal projects.

Troubleshooting

The CIAB session boots but claude-code is not found. Your sandbox image doesn't have the CLI. For local runtime, install on the host. For EC2, rebuild your AMI (see EC2 setup).

The agent's review is generic / non-specific. Almost always means the LLM isn't seeing the diff. Check the run transcript for github.get_diff — if it returned empty, the PAT lacks pull_request: read scope. If it returned the diff but the review is still vague, you're likely on a smaller model — frontier models pay off here.

GitHub rejects the comment with 403. The token can read but not write. PAT needs repo scope (or pull_request: write for fine-grained). For Apps, ensure pull_requests: write is on the installation.

Webhook fires but task doesn't run. Signature mismatch. GitHub sends X-Hub-Signature-256: sha256=<hex>. Make sure the secret stored in GitHub matches the one Agentcy returned at task creation. The receiver returns 401 — visible at GET /api/v1/hooks/deliveries.

The agent burns through the daily cost cap on a giant PR. The max_diff_lines: 1000 knob in the example exists for this reason — the agent should refuse to review huge diffs. Tighten the instruction, or make the cap project-specific.

Two PRs hit the task at once and one stalls.max_concurrent_runs: 5 lets up to 5 reviews run in parallel. If you set 1, additional PRs queue. If you have lots of activity, bump this and turn on Workers (AGENTCY_FEATURES_WORKERS=true) so the API doesn't host the agent loops.

Next

Built by AgentcyLabs. For in-house deployment or Agentcy Cloud (PaaS) access, visit agentcylabs.com.