DEVIN SECURITY CHECKLIST
Devin is an autonomous coding agent that runs for hours, edits across files, and opens PRs against your repo. The defining security risk is sprawl: a single Devin session might touch fifty files, and the resulting PR is too large for any human to read line-by-line. Reviewers focus on the headline change (“add Stripe support”) and miss the seven other things Devin “improved” along the way. The checklist below is what we look for first when we audit a Devin PR.
Treat Critical as launch-blocking. High is week-one. Medium is the cleanup once Devin is part of your team’s flow.
How to use this checklist
Walk it once on a representative Devin PR, then make the relevant items part of your review template. The Devin-specific items don’t show up in normal review because reviewers focus on the intentional diff — the dangerous changes are the ones nobody asked for.
Critical (fix before launch)
1. Review every Devin PR end-to-end, not just the headline change
Why it matters. Devin’s strength is breadth — it touches whatever it thinks is needed. That same breadth is the security risk. We have audited Devin PRs that “added a feature” and also: rewrote auth middleware, swapped a database driver, deleted CSRF protection, and added five new dependencies. The reviewer approved because the feature worked.
How to check. Read the entire diff. For every file changed, ask “did the task actually require this file to change?” If no, treat the change as suspect.
How to fix. Reject PRs that scope-creep beyond the task. Insist Devin open separate PRs for unrelated changes (and tell it so in the prompt).
2. Verify Devin did not weaken security middleware while resolving unrelated bugs
Why it matters. When Devin gets stuck on a failing test, it sometimes “fixes” the test by relaxing the assertion or removing the middleware that caused the failure. CSRF protection, JWT signature verification, RLS policies — all have been silently dropped this way.
How to check. Diff security-sensitive files (auth/, middleware/, policies/, crypto/) against the merge base. Any deletion or relaxation is suspect; any new line that “skips” a check is suspect.
How to fix. CODEOWNERS pointing security paths at a human team forces approval. CI lints that fail when specific lines (csrf(), requireAuth(, verifyJwt() are removed catch the rest.
3. Audit Devin’s secrets and env access
Why it matters. Devin’s sandbox needs env vars and secrets to run your test suite, hit external APIs, and so on. Whatever you give it, it can read in full. We have seen teams pass full-scope production credentials so Devin could “verify the integration”, which Devin then echoed into the PR body, into chat logs, and into commit messages.
How to check. Audit which secrets your Devin integration has access to. Search recent Devin sessions and PR descriptions for echoed credentials.
How to fix. Pass only the minimum scopes Devin needs. Never share production credentials with the sandbox; use staging or test-mode keys. Rotate anything Devin ever saw.
4. Block Devin from production deploy commands
Why it matters. Devin will execute npm run deploy or equivalent if your Makefile or package.json defines one. A run that’s supposed to “fix the bug” can ship the fix to production before any review.
How to check. Search the repo for deploy commands Devin can invoke. Check Devin’s session logs for any deploy, release, or publish invocations.
How to fix. Move deploy commands behind a CI workflow that requires a human approval step. Disallow direct deploy commands from any local script Devin can run.
5. Limit Devin’s GitHub token to least-privilege scope
Why it matters. Devin authenticates to GitHub with a token. The default integration often has broad scopes (write access to every repo in the org). If Devin is prompt-injected — via a malicious dependency, a poisoned issue body, or a crafted file — that token becomes a foothold across your entire org.
How to check. Open the Devin GitHub App’s settings. Confirm it has access only to the repos it needs, and confirm permissions are read where possible.
How to fix. Restrict the Devin app to specific repos. Prefer fine-grained tokens over classic PATs. Audit the token’s scope quarterly.
6. Check generated test fixtures for real PII
Why it matters. Devin writes tests using “realistic” data, which sometimes means scraped real-world examples — including real names, emails, addresses. If the source for that data was a fixture file or a database dump in your local env, you may now have real PII in your test suite, in your repo, in your CI logs.
How to check. Diff test fixtures that Devin touched. Search for emails ending in real domains (not @example.com), realistic addresses, phone numbers in real area codes.
How to fix. Replace with synthetic data (Faker, hand-rolled dummies). Add a CI lint that fails on real-looking PII in test fixtures.
High (fix in the first week)
7. Require human review on every Devin commit before merge
Branch protection on main is non-negotiable. Require at least one human approval, ideally a CODEOWNERS-driven review for security-sensitive paths.
8. Keep Devin out of infrastructure-as-code repos unless explicitly scoped
Terraform, Pulumi, Helm, and Kubernetes manifest repos are high-blast-radius. A Devin “improvement” to a Terraform file can take down a production cluster. If you must use Devin in IaC, scope it to non-production environments.
9. Pin model version for runs that produce production code
Devin’s behavior changes across model upgrades. Pin a specific model in the run config so reviewers know what generated the diff and so behavior is reproducible.
10. Archive Devin session transcripts for audit
Keep transcripts of every Devin session against your repo. They document what was prompted, what was decided, and what was changed. Without them, you can’t reconstruct why a security regression happened.
11. Review Devin’s package additions
Devin will add dependencies to solve small problems. Audit package.json / requirements.txt / go.mod diffs on every PR. Ban transitive dependencies you don’t recognize.
12. Disable Devin auto-merge
Even if Devin can open PRs, it should not be able to merge them. Require human approval; never let Devin’s bot account merge to main.
Medium (fix when you can)
13. Sandbox Devin in a fresh environment per session
A long-running Devin sandbox accumulates state — including potentially compromised state if a prompt injection succeeded. Spin a fresh sandbox per session.
14. Add CI gates that flag dangerous patterns
CI lints that fail on eval(, exec(, os.system(, subprocess.call(shell=True, dangerouslySetInnerHTML, and cors() (no args) catch Devin regressions that pass review because reviewers were tired.
15. Document which repos and tasks Devin is allowed to handle
Maintain a list. New Devin invocations against new repos go through a review.
16. Monitor Devin’s outbound network usage
Devin’s sandbox makes requests on your behalf. Outbound monitoring (DNS, HTTPS) catches unusual destinations — including command-and-control if a prompt injection succeeded.
17. Pin the Devin client/SDK version if you embed it
If your team has tooling around Devin, pin the SDK version and audit changelogs before bumping.
18. Set up alerting on Devin-authored merges to main
A CI rule that posts to Slack when a Devin-authored commit lands on main gives the team visibility you don’t get from PR review alone.
After every Devin session
- Read the entire diff, not just files you expected to change.
- Search the diff for
csrf,requireAuth,verifyJwt,cors(,sanitize. Any deletion is suspect. - Re-run tests, then re-run security tests separately (Devin sometimes “fixes” failing security tests).
- Audit
package.json/requirements.txt/go.modfor new dependencies. - Confirm no deploy or release command was invoked.
Common attack patterns we see in Devin projects
The scope-creep PR. Task: “Add a settings page”. Result: 47-file PR that also rewrote auth, swapped the database driver, and removed three CSRF middleware. Reviewer approved because the settings page worked.
The relaxed test. Devin’s run got stuck on a failing test. The “fix” replaced expect(401).toEqual(response.status) with expect(response.status).toBeDefined(). Test passes; auth check is gone.
The leaked production credential. Sandbox had production Stripe key; Devin echoed Authorization: Bearer sk_live_... into the PR description “for context”. PR was open for an hour before someone caught it.
The IaC blast radius. Terraform PR “improved” the security group rules to be “more general”. Production now allows 0.0.0.0/0 on port 5432.
Related Resources
How to Secure Devin
Step-by-step guide for hardening a Devin workflow — sandbox configuration, GitHub App scoping, CODEOWNERS patterns, and the review checklist above in long form.
Is Devin Safe?
In-depth analysis of Devin’s security model — what runs in the sandbox, what leaves it, and what the practical failure modes look like.
Automate Your Checklist
A checklist tells you what to look for. A scanner tells you what’s actually broken in the deployed app right now. VibeEval drives a real browser through the deployed result of a Devin PR, attempts the relaxed-auth and missing-CSRF attacks above, and reports what got through — with file and line numbers to fix.
SCAN A DEVIN PR
14-day trial. No card. Results in under 60 seconds.