AI PENTESTING: WHAT IT IS AND HOW IT WORKS
AI pentesting uses autonomous agents to probe live web applications for vulnerabilities. Same goal as a human pentester — find the holes before an attacker does — but at a cadence humans can't match.
What is AI pentesting?
AI pentesting is penetration testing driven by autonomous software agents rather than a human pentester. An agent is given a target URL and a scope, loads the app the way a browser would, maps the API surface from captured traffic, and then uses a large language model plus structured tool use to decide what to probe next. Each decision — which endpoint, which payload, which authentication bypass to try — is made dynamically based on what the agent has observed.
The result is a pentest report: findings ranked by severity, each with evidence, each with remediation. The output shape is the same as a human engagement. The input cost and cadence are different by orders of magnitude.
How AI pentesting works under the hood
A modern AI pentesting agent is built from three layers:
- A reasoning model — usually a large language model (Claude, GPT, or specialized fine-tunes) that decides what to test next based on observations.
- A tool layer — deterministic code for making HTTP requests, capturing responses, parsing headers, generating payloads, and managing authentication state.
- A scope and safety layer — enforces the target URL boundary, rate limits, and blocks destructive actions even if the reasoning model would suggest them.
The agent runs a loop: observe the target, decide on an action, execute it through a tool, observe the result, update its model of the target, and repeat. Classic tool-use architecture. The reasoning model’s job is to plan like an attacker; the tool layer’s job is to execute without breaking anything; the scope layer’s job is to keep the agent honest.
Inside the planner
The planner is the part that turns observations into next actions. In the apps we audit it usually looks like a structured loop with four checkpoints:
- Surface state — what endpoints exist, what auth states have been captured, what findings already exist.
- Coverage gaps — which endpoints have not been tested yet against which vulnerability classes.
- Hypothesis selection — which gap is highest expected value to probe next.
- Tool invocation — what payload, what HTTP verb, what session, what assertion to make about the response.
This is just goal-directed search with the LLM as the heuristic. The interesting part is that the heuristic is good enough to chain findings: if the agent has just discovered an exposed user list endpoint, the next probe is automatically “try BOLA on each row” rather than continuing the linear coverage walk.
Tool layer in practice
The tool layer is the boring part, and that is the point. Each tool is a deterministic function the model can call:
http(method, url, headers, body)— issue a request, return status + headers + body.decode_jwt(token)— pull header, payload, signature for inspection.fuzz(input_field, wordlist)— replay a request with each payload from a wordlist.login(username, password)— execute the captured login flow and store the session.crawl(url, depth)— render with a headless browser and extract every link, request, and form.probe_rls(table)— for Supabase targets, run anonymous + cross-user CRUD against a named table.
The tools are deterministic so that findings replay exactly. The LLM picks which tool to call and with what arguments, but never executes the network operation itself. This separation is what keeps the agent within scope — the safety layer wraps every tool with the URL allowlist and rate-limit budget.
The validation loop
A finding is not emitted until the agent has validated it. Validation runs the same probe a second time, on a clean session, with a fresh DNS resolution, and asserts the same response. False positives almost always come from stale state — a login that lingered, a cookie that was still hot, a CDN cache miss that flipped behavior. The replay catches them.
This is why a properly-built AI pentest report is high signal: every finding ships with a request, a response, and a replayable proof of concept.
What AI pentesting covers well
- Reconnaissance — mapping subdomains, endpoints, technologies, exposed services
- Authentication testing — login flows, session management, MFA bypass, password policy
- Authorization testing — IDOR, BOLA, role escalation, ownership checks on every endpoint
- Injection — SQL, NoSQL, command, XSS, SSRF, and LLM prompt injection
- Configuration — missing security headers, permissive CORS, open cloud storage, exposed admin panels
- Credential exposure — API keys, tokens, secrets in frontend bundles and source maps
- Known-vulnerability classes — matching observed behavior against OWASP Top 10 and CVE patterns
Where human pentesters still win
- Business logic — can an attacker buy a product for $1 by chaining a coupon with a currency bug? Humans spot these.
- Creative social engineering — phishing the CEO’s assistant to reset an admin password.
- Physical and assumed-trust scenarios — anything involving humans in the loop.
- Novel attack classes — the first person to exploit a new vulnerability is usually a human.
The AI pentest taxonomy
Every probe an AI pentest agent runs falls into one of seven branches. Knowing the branches makes the report easier to read and the gaps easier to spot.
Reconnaissance
Subdomain enumeration, endpoint discovery, framework fingerprinting, technology detection, captured network traffic, exposed source maps, public bucket listings. Recon does not find vulnerabilities directly — it builds the map the rest of the scan walks.
Authentication
Login flow integrity, session token entropy, JWT signing, password policy strength, password reset abuse, MFA bypass, OAuth callback validation, account enumeration on signup and reset.
Authorization
IDOR (insecure direct object reference) on numeric IDs, BOLA on UUIDs, missing ownership checks on update endpoints, role escalation through profile-update mass assignment, missing tenant isolation on multi-tenant endpoints, missing RLS on Supabase / Firestore / DynamoDB tables.
Injection
SQL injection, NoSQL injection, command injection, server-side template injection, server-side request forgery (SSRF), reflected and stored XSS, DOM-based XSS, prompt injection (direct and indirect), tool poisoning in agentic systems.
Business logic
Pricing manipulation, coupon stacking, race conditions on inventory or balance, replay of one-time tokens, workflow bypass (skipping a verification step), state attacks on multi-step flows. The agent only catches a subset; humans catch more.
Data exposure
Secrets in JS bundles, source maps committed to production, verbose error responses, debug routes left in production, directory listing on storage, exposed admin panels, leaked tokens in URL parameters, GraphQL introspection enabled in prod.
Infrastructure
Missing security headers, permissive CORS, weak TLS configuration, outdated dependencies with known CVEs, open management ports, default credentials, exposed metrics or healthcheck endpoints with sensitive payload.
The matrix view: every endpoint × every branch × every payload class. The agent walks the matrix and reports on every cell that fires.
Inside the agent’s prompt
The system prompt for an AI pentest agent is its constitution. It sets four things:
- Persona and goal — “You are an autonomous penetration tester. Your job is to find vulnerabilities in the target and report them with evidence.”
- Scope and safety — “You may only issue requests against URLs matching this allowlist. You may not perform destructive operations (DELETE, DROP, DESTROY). You may not flood, lock, or DoS.”
- Tool catalog — every tool the agent can call, with input/output contract and example invocations.
- Output contract — the shape of a finding (severity, endpoint, evidence, fix prompt) and the rule that nothing is reported without a validated replay.
Variants of this prompt drive specialized agents — a Lovable-aware agent gets a tool to enumerate Supabase tables; a GraphQL-aware agent gets a tool to introspect schemas; a vibe-pentest agent gets the 14-pattern checklist embedded directly in scope. The shape is the same; the tool catalog and persona are what change.
AI pentest methodology
- Define scope — target URL, allowed endpoints, authentication credentials if any.
- Reconnaissance — map the attack surface: subdomains, endpoints, tech stack, exposed services.
- Authentication probing — test login flows, session management, password policy, MFA bypass vectors.
- Authorization probing — systematically test every endpoint for IDOR, BOLA, role escalation.
- Input testing — fuzz every input surface for injection, XSS, SSRF, prompt injection.
- Configuration review — security headers, CORS, CSP, cookie flags, storage ACLs.
- Report generation — findings ranked by severity with evidence and remediation.
- Rescan — verify fixes after remediation ships.
AI pentesting vs traditional pentesting
| Aspect | AI pentesting | Traditional pentesting |
|---|---|---|
| Driven by | Autonomous agent | Human pentester |
| Duration | Minutes | Days to weeks |
| Cost | $20–$500/month | $5,000–$50,000 per engagement |
| Cadence | Continuous (every deploy) | Annual or ad-hoc |
| Coverage | Exhaustive on common classes | Creative + business logic |
| Best for | Continuous CI/CD coverage | Compliance audits, novel attacks |
Neither replaces the other. The pragmatic pattern is AI pentesting continuously, human pentesting annually.
AI pentesting vs DAST scanners
DAST (dynamic application security testing) tools — OWASP ZAP, Burp Suite scanner, Acunetix, Detectify — are the closest neighbor to AI pentesting. They share the runtime, black-box, no-source-needed model. They differ on planning.
| Aspect | AI pentesting | Traditional DAST |
|---|---|---|
| Planner | LLM-driven, adaptive | Rule-based, fixed signatures |
| Chains findings | Yes (kill-chain) | No (per-finding only) |
| Adapts payloads to app | Yes | No |
| Authenticated coverage | First-class (multi-user) | Bolt-on, often single user |
| Business logic | Limited but real | Effectively none |
| Reproduction proof | Validated replay | Often not validated |
| Cost model | Per scan / subscription | Per seat / appliance |
DAST is a lower bound — every team should have one running. AI pentesting is the upper bound on what is achievable without a human in the loop. See AI Pentest vs OWASP ZAP and AI Pentest vs Burp Suite for category detail.
AI pentesting vs bug bounty
| Aspect | AI pentesting | Bug bounty |
|---|---|---|
| Coverage | Predictable, continuous | Reactive, depends on researcher attention |
| Cost shape | Subscription, fixed | Per-finding, variable |
| Time to first finding | Minutes | Weeks to months |
| Severity ceiling | High | Very high (creative humans) |
| Compliance acceptance | Yes for most frameworks | Usually requires a paired pentest |
Bug bounty is excellent for catching the long tail of creative attacks. It is poor for catching the same RLS bug you re-introduce on every deploy. The two complement each other: AI pentesting catches the boring 80% before the bounty researchers waste their time on it.
When AI pentesting is the right choice
- Pre-launch checks on vibe-coded and AI-generated apps — see Vibe Pentesting.
- Post-deploy verification on every release of a web application.
- Continuous coverage on fast-moving codebases where weekly releases make annual pentests stale by month two.
- Startups and small teams with no dedicated security budget.
- Between human pentests — the 11 months of the year when the last human pentest is already out of date.
When to bring in a human pentester
- Compliance audits (SOC 2, PCI, HIPAA) — regulators still require human sign-off.
- Business-logic bugs — creative multi-step attacks that require understanding intent.
- High-value targets — financial, healthcare, critical infrastructure.
- Novel application architectures — where no AI agent has been trained on similar targets.
When you should NOT use AI pentesting
Honest limitations. AI pentesting is not the right tool when:
- The app is on a private network with no public surface. AI pentest agents test from outside. If you need internal recon, you need a human or an internal-network deployment.
- You are testing a hardware or firmware target. Web-app methodology does not transfer.
- The bug class is purely social. Phishing the CEO’s assistant. Pretexting a support agent into resetting MFA. AI agents do not do this work.
- The legal scope is unclear. Do not run an AI pentest against a system you do not own or have explicit written permission to test. The agent will succeed; the lawsuit will too.
- You have not threat-modeled the product. AI pentesting catches generic failure modes. It does not know that the most sensitive table in your schema is
bank_accountsrather thanusers. Pair it with a human threat model on the first run. - You need a single deep finding rather than broad coverage. Targeted research on a specific bug is still a human craft.
Cadence variants
Different teams need different scan rhythms. The agent and the methodology are the same; the trigger and the depth budget change.
Pre-deploy scan
Triggered on every PR or every staging deploy. Budget: under three minutes. Goal: catch the regression before merge. Mode: unauthenticated + one logged-in test account. Output: PR comment with severity-Critical findings only; everything else goes to the dashboard.
Per-release scan
Triggered on every production deploy. Budget: 5–15 minutes. Goal: comprehensive coverage of the surface that just shipped. Mode: full authenticated scan with two test accounts for cross-user checks. Output: full report, severity-Critical and -High block the rollout via a webhook.
Continuous scan
Runs on a schedule — every six hours, daily, or weekly. Budget: 10–30 minutes. Goal: catch drift in third-party dependencies, configuration changes, infrastructure mutations. Mode: full scan, includes external recon (subdomain enumeration, certificate transparency monitoring). Output: alert-on-diff (only emit when a finding changes).
Compliance scan
Triggered before an audit. Budget: hours. Goal: produce evidence in the format the auditor expects (SOC 2, HIPAA, PCI-DSS). Mode: human-in-the-loop, full report including methodology disclosure and pre-test attestation. Output: PDF + machine-readable JSON.
Anonymized findings — what AI pentests catch
A representative cross-section of findings from apps we audit. Each is generalized enough to ship publicly. Each is the kind of issue an AI pentest reliably surfaces.
Finding A — JWT signed with HS256 and a leaked secret
- Endpoint:
/api/admin/users - Evidence: The login response returned a JWT with
alg: HS256. The frontend bundle contained the literal secret string used to sign tokens, exposed via a Vite environment variable that was supposed to be server-side. - Impact: Forge an admin token with the leaked secret, walk into every admin route. Full account takeover for any user.
- Fix: Rotate the secret. Move signing to RS256 with a private key that never ships to the client. Add a verifier rule that rejects tokens older than the rotation timestamp.
Finding B — IDOR on a numeric receipt ID
- Endpoint:
GET /api/receipts/:id - Evidence: Authenticated request to
/api/receipts/41217returned the current user’s receipt. Authenticated request to/api/receipts/41218returned another user’s receipt with full PII (name, email, billing address, line items). - Impact: Walk the ID space, exfiltrate every receipt in the system.
- Fix: Add an ownership check on the handler:
WHERE receipt.user_id = auth.user_id. For Supabase apps, an RLS policy withauth.uid() = user_idcovers it.
Finding C — Stored XSS in a public profile bio
- Endpoint:
POST /api/profile - Evidence: Submitting a
biofield with<img src=x onerror=alert(1)>was accepted and rendered withdangerouslySetInnerHTMLon the public profile route. - Impact: Any visitor to the profile page executes attacker JavaScript in their session, including the victim’s auth cookie.
- Fix: Sanitize HTML on render with DOMPurify. Better: do not accept HTML at all — render bios as plain text with markdown support.
Finding D — SSRF in a “fetch URL preview” feature
- Endpoint:
POST /api/preview - Evidence: Submitting
http://169.254.169.254/latest/meta-data/iam/security-credentials/returned the response body — the AWS metadata service responded to the server-side fetch. - Impact: Exfiltrate AWS IAM credentials, escalate to whatever permissions the EC2 / Fargate role has.
- Fix: Block private and link-local IP ranges in the preview fetcher. Use a vetted SSRF-safe HTTP client like
safe-fetch. Restrict the IAM role to least privilege.
Finding E — Mass assignment on profile update
- Endpoint:
PATCH /api/users/:id - Evidence: The handler accepted the entire JSON body and forwarded it to the ORM. Including
"role": "admin"in the body upgraded the requester to admin. - Impact: Any authenticated user becomes admin in one request.
- Fix: Whitelist updatable fields explicitly. Never destructure a request body into the ORM call. Treat
roleand other privilege fields as system-managed.
Finding F — Open Supabase storage bucket exposing user-uploaded files
- Endpoint:
https://<project>.supabase.co/storage/v1/object/public/uploads/... - Evidence: The bucket was set to public. Anonymous GET on the bucket listing endpoint returned every file path, including filenames containing the original user’s email.
- Impact: Enumerate every uploaded file. PII leak.
- Fix: Set the bucket to private. Add storage policies that gate read access on
auth.uid() = owner_id. Generate signed URLs server-side with short TTL for legitimate access.
A worked kill chain
This is the kind of multi-step exploit AI pentest agents are increasingly good at composing. The example is anonymized but representative of patterns we keep seeing.
Step 1 — Recon. The agent crawls the SPA and finds a source-map.js.map file in the production bundle. Extracted: a Stripe restricted key with read:customers scope and a JWT signing secret labeled JWT_SECRET.
Step 2 — JWT forgery. The agent decodes a captured user JWT (alg: HS256), confirms the leaked JWT_SECRET matches by re-signing the captured token, and forges a new token with role: admin and a long expiry.
Step 3 — Admin route discovery. The agent walks the SPA’s React Router config (also recovered from the source map) and finds /admin/billing — a route that the unauthenticated app redirects away from. With the forged admin JWT, the route loads.
Step 4 — IDOR on admin billing. The admin route hits GET /api/admin/billing/:userId. The agent enumerates user IDs and discovers the endpoint returns full Stripe customer records — last4 of card, billing address, recent charges.
Step 5 — Data exfiltration. The agent reports the chain end-to-end: leaked secret → forged token → privileged route → enumerable PII. One report, one fix prompt that addresses the root cause (do not ship secrets in source maps), and a linked set of secondary fixes (rotate, RS256, ownership checks).
This is what “AI pentest” means in practice: not a list of unrelated findings, but a story.
Fix prompts
These are copy-pasteable into Claude Code, Cursor, or any agentic coding tool. Each one targets a finding class an AI pentest commonly surfaces.
Fix prompt for missing RLS
The Supabase table `<table_name>` is currently anonymous-readable. Enable Row
Level Security on the table, then write four policies:
1. SELECT: auth.uid() = user_id (only owner reads).
2. INSERT: auth.uid() = user_id (only authenticated users insert as themselves).
3. UPDATE: auth.uid() = user_id (only owner updates).
4. DELETE: auth.uid() = user_id (only owner deletes, if delete is allowed at all).
Generate the SQL migration. Do not break existing service-role access.
Fix prompt for IDOR / BOLA
The endpoint `<METHOD> <route>` returns a resource by ID without checking that
the requesting user owns it. Add an authorization check on the handler:
- Look up the resource.
- If the resource is missing, return 404.
- If the resource exists but does not belong to the requesting user, return
404 (not 403 — do not leak existence).
- Only on owner match, return the body.
Add a regression test that asserts cross-user access returns 404.
Fix prompt for exposed key
The frontend bundle currently contains `<KEY_NAME>`, which is a server-only
secret. Do these in order:
1. Rotate the key in the provider dashboard. The old one is permanently burned.
2. Remove it from any `.env` file that is read by the client build (Vite
`VITE_*`, Next.js `NEXT_PUBLIC_*`, etc.).
3. Move the call that uses it to a server-side route or Edge Function.
4. Add a CI check that fails the build if the secret name appears in the
client bundle output.
Fix prompt for missing signature on webhook
The webhook handler at `<route>` accepts requests without verifying the
provider signature. For provider <provider>, the verification is:
1. Read the raw request body (bytes, not parsed JSON).
2. Read the signature header `<header_name>`.
3. Compute the HMAC of the body with the webhook secret.
4. Reject the request with 401 if the signatures do not match.
Use the provider's official SDK helper if it exists. Add a test that posts
an unsigned payload and asserts a 401 response.
Fix prompt for prompt injection guardrail
The endpoint `<METHOD> <route>` forwards user input to an LLM. Add the
following guardrails:
1. Wrap the user input in a clearly-delimited block (e.g., XML tags) and
instruct the model to treat its contents as data, not instructions.
2. Run the user input through a classifier (or a cheap model call) that
flags injection patterns: "ignore previous", "you are now", system-prompt
leak attempts.
3. Log every invocation with the input, the output, and the classifier
verdict. Set up an alert on classifier-positive traffic.
4. Constrain the LLM's tool access — never give it write access to anything
sensitive based on user input alone.
Glossary
- Agent — a software program that loops observe-decide-act-record using an LLM as the planner and a tool catalog as the action surface.
- BOLA (Broken Object Level Authorization) — fetching another user’s resource by changing the ID in an API call. The OWASP API Top 10 #1.
- DAST — dynamic application security testing. Black-box scanning at runtime, signature-driven.
- IDOR — insecure direct object reference. The classical name for what OWASP API now calls BOLA.
- JWT — JSON Web Token, a signed token format used for sessions. Frequently exploitable when signed with a leaked secret or
alg: none. - Kill chain — a chained exploit. Multiple findings composed into a single attack that lands more impact than any one finding alone.
- PoC — proof of concept. The reproducible request-response evidence that a finding is real.
- PTaaS — penetration testing as a service. Subscription pentesting on autopilot.
- Recon — reconnaissance. The phase where the attacker maps the surface before probing.
- RLS — Row Level Security. Database-layer authorization that scopes every query to the current user.
- SAST — static application security testing. Source-code analysis without running the app.
- SSRF — server-side request forgery. Tricking the server into making a request to an internal address.
- System prompt — the instructions that define the AI agent’s role, scope, and tool catalog.
- Tool use — the agentic pattern where the LLM emits structured calls to deterministic tools rather than performing actions itself.
- Validation pass — a second-pass replay that confirms a finding before the agent reports it. The reason AI pentest reports have low false-positive rates.
Related guides
- AI Pentest vs Traditional Pentesting — side-by-side comparison
- AI Penetration Testing Guide — 10-step checklist
- Vibe Pentesting — pentesting methodology for vibe-coded apps
- Lovable Pentesting — Lovable-specific methodology
- AI Pentest for Web Applications — web-app scope specifics
- AI Pentest for APIs — REST, GraphQL, WebSocket
- AI Pentest for Cloud Infrastructure — AWS, GCP, Azure scope
- AI Pentest for SaaS — multi-tenant SaaS pentest playbook
- Continuous Penetration Testing — every-deploy cadence
- PTaaS — subscription pentesting model
- Compliance Penetration Testing — SOC 2, HIPAA, PCI scope
- AI Vulnerability Assessment — finding triage and PoC generation
- Vulnerability Scanning vs AI Pentest — what the difference actually is
- VibeEval vs Burp Suite — AI agent vs manual pentest tool
- VibeEval vs OWASP ZAP — AI agent vs DAST scanner
- VibeEval vs Snyk — runtime AI pentest vs SAST + SCA
- Best Security Scanner for AI Apps — category-wide comparison
- Integration Layer Is the Real Security Gap — where AI-built apps fail
- Vibe Code Scanner — free scanner to try
- Token Leak Checker — focused scan for exposed keys
- Supabase RLS Checker — verify every table has a correct policy
- Firebase Scanner — Firestore Security Rules audit
- Security Headers Checker — CSP, HSTS, CORS audit
- Package Hallucination Scanner — find AI-invented dependencies
- AI Coding Tool Security Hub — vulnerability taxonomy across every AI tool
- Safety Reviews — “is X safe?” audits for every AI builder
- OWASP Top 10 for AI Code — the failure-mode taxonomy
COMMON QUESTIONS
RUN AN AI PENTEST
14-day trial. No card. Full agent-driven scan on your deployed URL in under 60 seconds.