AI PENTESTING: WHAT IT IS AND HOW IT WORKS

AI pentesting uses autonomous agents to probe live web applications for vulnerabilities. Same goal as a human pentester — find the holes before an attacker does — but at a cadence humans can't match.

What is AI pentesting?

AI pentesting is penetration testing driven by autonomous software agents rather than a human pentester. An agent is given a target URL and a scope, loads the app the way a browser would, maps the API surface from captured traffic, and then uses a large language model plus structured tool use to decide what to probe next. Each decision — which endpoint, which payload, which authentication bypass to try — is made dynamically based on what the agent has observed.

The result is a pentest report: findings ranked by severity, each with evidence, each with remediation. The output shape is the same as a human engagement. The input cost and cadence are different by orders of magnitude.

How AI pentesting works under the hood

A modern AI pentesting agent is built from three layers:

  1. A reasoning model — usually a large language model (Claude, GPT, or specialized fine-tunes) that decides what to test next based on observations.
  2. A tool layer — deterministic code for making HTTP requests, capturing responses, parsing headers, generating payloads, and managing authentication state.
  3. A scope and safety layer — enforces the target URL boundary, rate limits, and blocks destructive actions even if the reasoning model would suggest them.

The agent runs a loop: observe the target, decide on an action, execute it through a tool, observe the result, update its model of the target, and repeat. Classic tool-use architecture. The reasoning model’s job is to plan like an attacker; the tool layer’s job is to execute without breaking anything; the scope layer’s job is to keep the agent honest.

Inside the planner

The planner is the part that turns observations into next actions. In the apps we audit it usually looks like a structured loop with four checkpoints:

  1. Surface state — what endpoints exist, what auth states have been captured, what findings already exist.
  2. Coverage gaps — which endpoints have not been tested yet against which vulnerability classes.
  3. Hypothesis selection — which gap is highest expected value to probe next.
  4. Tool invocation — what payload, what HTTP verb, what session, what assertion to make about the response.

This is just goal-directed search with the LLM as the heuristic. The interesting part is that the heuristic is good enough to chain findings: if the agent has just discovered an exposed user list endpoint, the next probe is automatically “try BOLA on each row” rather than continuing the linear coverage walk.

Tool layer in practice

The tool layer is the boring part, and that is the point. Each tool is a deterministic function the model can call:

  • http(method, url, headers, body) — issue a request, return status + headers + body.
  • decode_jwt(token) — pull header, payload, signature for inspection.
  • fuzz(input_field, wordlist) — replay a request with each payload from a wordlist.
  • login(username, password) — execute the captured login flow and store the session.
  • crawl(url, depth) — render with a headless browser and extract every link, request, and form.
  • probe_rls(table) — for Supabase targets, run anonymous + cross-user CRUD against a named table.

The tools are deterministic so that findings replay exactly. The LLM picks which tool to call and with what arguments, but never executes the network operation itself. This separation is what keeps the agent within scope — the safety layer wraps every tool with the URL allowlist and rate-limit budget.

The validation loop

A finding is not emitted until the agent has validated it. Validation runs the same probe a second time, on a clean session, with a fresh DNS resolution, and asserts the same response. False positives almost always come from stale state — a login that lingered, a cookie that was still hot, a CDN cache miss that flipped behavior. The replay catches them.

This is why a properly-built AI pentest report is high signal: every finding ships with a request, a response, and a replayable proof of concept.

What AI pentesting covers well

  • Reconnaissance — mapping subdomains, endpoints, technologies, exposed services
  • Authentication testing — login flows, session management, MFA bypass, password policy
  • Authorization testing — IDOR, BOLA, role escalation, ownership checks on every endpoint
  • Injection — SQL, NoSQL, command, XSS, SSRF, and LLM prompt injection
  • Configuration — missing security headers, permissive CORS, open cloud storage, exposed admin panels
  • Credential exposure — API keys, tokens, secrets in frontend bundles and source maps
  • Known-vulnerability classes — matching observed behavior against OWASP Top 10 and CVE patterns

Where human pentesters still win

  • Business logic — can an attacker buy a product for $1 by chaining a coupon with a currency bug? Humans spot these.
  • Creative social engineering — phishing the CEO’s assistant to reset an admin password.
  • Physical and assumed-trust scenarios — anything involving humans in the loop.
  • Novel attack classes — the first person to exploit a new vulnerability is usually a human.

The AI pentest taxonomy

Every probe an AI pentest agent runs falls into one of seven branches. Knowing the branches makes the report easier to read and the gaps easier to spot.

Reconnaissance

Subdomain enumeration, endpoint discovery, framework fingerprinting, technology detection, captured network traffic, exposed source maps, public bucket listings. Recon does not find vulnerabilities directly — it builds the map the rest of the scan walks.

Authentication

Login flow integrity, session token entropy, JWT signing, password policy strength, password reset abuse, MFA bypass, OAuth callback validation, account enumeration on signup and reset.

Authorization

IDOR (insecure direct object reference) on numeric IDs, BOLA on UUIDs, missing ownership checks on update endpoints, role escalation through profile-update mass assignment, missing tenant isolation on multi-tenant endpoints, missing RLS on Supabase / Firestore / DynamoDB tables.

Injection

SQL injection, NoSQL injection, command injection, server-side template injection, server-side request forgery (SSRF), reflected and stored XSS, DOM-based XSS, prompt injection (direct and indirect), tool poisoning in agentic systems.

Business logic

Pricing manipulation, coupon stacking, race conditions on inventory or balance, replay of one-time tokens, workflow bypass (skipping a verification step), state attacks on multi-step flows. The agent only catches a subset; humans catch more.

Data exposure

Secrets in JS bundles, source maps committed to production, verbose error responses, debug routes left in production, directory listing on storage, exposed admin panels, leaked tokens in URL parameters, GraphQL introspection enabled in prod.

Infrastructure

Missing security headers, permissive CORS, weak TLS configuration, outdated dependencies with known CVEs, open management ports, default credentials, exposed metrics or healthcheck endpoints with sensitive payload.

The matrix view: every endpoint × every branch × every payload class. The agent walks the matrix and reports on every cell that fires.

Inside the agent’s prompt

The system prompt for an AI pentest agent is its constitution. It sets four things:

  1. Persona and goal — “You are an autonomous penetration tester. Your job is to find vulnerabilities in the target and report them with evidence.”
  2. Scope and safety — “You may only issue requests against URLs matching this allowlist. You may not perform destructive operations (DELETE, DROP, DESTROY). You may not flood, lock, or DoS.”
  3. Tool catalog — every tool the agent can call, with input/output contract and example invocations.
  4. Output contract — the shape of a finding (severity, endpoint, evidence, fix prompt) and the rule that nothing is reported without a validated replay.

Variants of this prompt drive specialized agents — a Lovable-aware agent gets a tool to enumerate Supabase tables; a GraphQL-aware agent gets a tool to introspect schemas; a vibe-pentest agent gets the 14-pattern checklist embedded directly in scope. The shape is the same; the tool catalog and persona are what change.

AI pentest methodology

  1. Define scope — target URL, allowed endpoints, authentication credentials if any.
  2. Reconnaissance — map the attack surface: subdomains, endpoints, tech stack, exposed services.
  3. Authentication probing — test login flows, session management, password policy, MFA bypass vectors.
  4. Authorization probing — systematically test every endpoint for IDOR, BOLA, role escalation.
  5. Input testing — fuzz every input surface for injection, XSS, SSRF, prompt injection.
  6. Configuration review — security headers, CORS, CSP, cookie flags, storage ACLs.
  7. Report generation — findings ranked by severity with evidence and remediation.
  8. Rescan — verify fixes after remediation ships.

AI pentesting vs traditional pentesting

Aspect AI pentesting Traditional pentesting
Driven by Autonomous agent Human pentester
Duration Minutes Days to weeks
Cost $20–$500/month $5,000–$50,000 per engagement
Cadence Continuous (every deploy) Annual or ad-hoc
Coverage Exhaustive on common classes Creative + business logic
Best for Continuous CI/CD coverage Compliance audits, novel attacks

Neither replaces the other. The pragmatic pattern is AI pentesting continuously, human pentesting annually.

AI pentesting vs DAST scanners

DAST (dynamic application security testing) tools — OWASP ZAP, Burp Suite scanner, Acunetix, Detectify — are the closest neighbor to AI pentesting. They share the runtime, black-box, no-source-needed model. They differ on planning.

Aspect AI pentesting Traditional DAST
Planner LLM-driven, adaptive Rule-based, fixed signatures
Chains findings Yes (kill-chain) No (per-finding only)
Adapts payloads to app Yes No
Authenticated coverage First-class (multi-user) Bolt-on, often single user
Business logic Limited but real Effectively none
Reproduction proof Validated replay Often not validated
Cost model Per scan / subscription Per seat / appliance

DAST is a lower bound — every team should have one running. AI pentesting is the upper bound on what is achievable without a human in the loop. See AI Pentest vs OWASP ZAP and AI Pentest vs Burp Suite for category detail.

AI pentesting vs bug bounty

Aspect AI pentesting Bug bounty
Coverage Predictable, continuous Reactive, depends on researcher attention
Cost shape Subscription, fixed Per-finding, variable
Time to first finding Minutes Weeks to months
Severity ceiling High Very high (creative humans)
Compliance acceptance Yes for most frameworks Usually requires a paired pentest

Bug bounty is excellent for catching the long tail of creative attacks. It is poor for catching the same RLS bug you re-introduce on every deploy. The two complement each other: AI pentesting catches the boring 80% before the bounty researchers waste their time on it.

When AI pentesting is the right choice

  • Pre-launch checks on vibe-coded and AI-generated apps — see Vibe Pentesting.
  • Post-deploy verification on every release of a web application.
  • Continuous coverage on fast-moving codebases where weekly releases make annual pentests stale by month two.
  • Startups and small teams with no dedicated security budget.
  • Between human pentests — the 11 months of the year when the last human pentest is already out of date.

When to bring in a human pentester

  • Compliance audits (SOC 2, PCI, HIPAA) — regulators still require human sign-off.
  • Business-logic bugs — creative multi-step attacks that require understanding intent.
  • High-value targets — financial, healthcare, critical infrastructure.
  • Novel application architectures — where no AI agent has been trained on similar targets.

When you should NOT use AI pentesting

Honest limitations. AI pentesting is not the right tool when:

  • The app is on a private network with no public surface. AI pentest agents test from outside. If you need internal recon, you need a human or an internal-network deployment.
  • You are testing a hardware or firmware target. Web-app methodology does not transfer.
  • The bug class is purely social. Phishing the CEO’s assistant. Pretexting a support agent into resetting MFA. AI agents do not do this work.
  • The legal scope is unclear. Do not run an AI pentest against a system you do not own or have explicit written permission to test. The agent will succeed; the lawsuit will too.
  • You have not threat-modeled the product. AI pentesting catches generic failure modes. It does not know that the most sensitive table in your schema is bank_accounts rather than users. Pair it with a human threat model on the first run.
  • You need a single deep finding rather than broad coverage. Targeted research on a specific bug is still a human craft.

Cadence variants

Different teams need different scan rhythms. The agent and the methodology are the same; the trigger and the depth budget change.

Pre-deploy scan

Triggered on every PR or every staging deploy. Budget: under three minutes. Goal: catch the regression before merge. Mode: unauthenticated + one logged-in test account. Output: PR comment with severity-Critical findings only; everything else goes to the dashboard.

Per-release scan

Triggered on every production deploy. Budget: 5–15 minutes. Goal: comprehensive coverage of the surface that just shipped. Mode: full authenticated scan with two test accounts for cross-user checks. Output: full report, severity-Critical and -High block the rollout via a webhook.

Continuous scan

Runs on a schedule — every six hours, daily, or weekly. Budget: 10–30 minutes. Goal: catch drift in third-party dependencies, configuration changes, infrastructure mutations. Mode: full scan, includes external recon (subdomain enumeration, certificate transparency monitoring). Output: alert-on-diff (only emit when a finding changes).

Compliance scan

Triggered before an audit. Budget: hours. Goal: produce evidence in the format the auditor expects (SOC 2, HIPAA, PCI-DSS). Mode: human-in-the-loop, full report including methodology disclosure and pre-test attestation. Output: PDF + machine-readable JSON.

Anonymized findings — what AI pentests catch

A representative cross-section of findings from apps we audit. Each is generalized enough to ship publicly. Each is the kind of issue an AI pentest reliably surfaces.

Finding A — JWT signed with HS256 and a leaked secret

  • Endpoint: /api/admin/users
  • Evidence: The login response returned a JWT with alg: HS256. The frontend bundle contained the literal secret string used to sign tokens, exposed via a Vite environment variable that was supposed to be server-side.
  • Impact: Forge an admin token with the leaked secret, walk into every admin route. Full account takeover for any user.
  • Fix: Rotate the secret. Move signing to RS256 with a private key that never ships to the client. Add a verifier rule that rejects tokens older than the rotation timestamp.

Finding B — IDOR on a numeric receipt ID

  • Endpoint: GET /api/receipts/:id
  • Evidence: Authenticated request to /api/receipts/41217 returned the current user’s receipt. Authenticated request to /api/receipts/41218 returned another user’s receipt with full PII (name, email, billing address, line items).
  • Impact: Walk the ID space, exfiltrate every receipt in the system.
  • Fix: Add an ownership check on the handler: WHERE receipt.user_id = auth.user_id. For Supabase apps, an RLS policy with auth.uid() = user_id covers it.

Finding C — Stored XSS in a public profile bio

  • Endpoint: POST /api/profile
  • Evidence: Submitting a bio field with <img src=x onerror=alert(1)> was accepted and rendered with dangerouslySetInnerHTML on the public profile route.
  • Impact: Any visitor to the profile page executes attacker JavaScript in their session, including the victim’s auth cookie.
  • Fix: Sanitize HTML on render with DOMPurify. Better: do not accept HTML at all — render bios as plain text with markdown support.

Finding D — SSRF in a “fetch URL preview” feature

  • Endpoint: POST /api/preview
  • Evidence: Submitting http://169.254.169.254/latest/meta-data/iam/security-credentials/ returned the response body — the AWS metadata service responded to the server-side fetch.
  • Impact: Exfiltrate AWS IAM credentials, escalate to whatever permissions the EC2 / Fargate role has.
  • Fix: Block private and link-local IP ranges in the preview fetcher. Use a vetted SSRF-safe HTTP client like safe-fetch. Restrict the IAM role to least privilege.

Finding E — Mass assignment on profile update

  • Endpoint: PATCH /api/users/:id
  • Evidence: The handler accepted the entire JSON body and forwarded it to the ORM. Including "role": "admin" in the body upgraded the requester to admin.
  • Impact: Any authenticated user becomes admin in one request.
  • Fix: Whitelist updatable fields explicitly. Never destructure a request body into the ORM call. Treat role and other privilege fields as system-managed.

Finding F — Open Supabase storage bucket exposing user-uploaded files

  • Endpoint: https://<project>.supabase.co/storage/v1/object/public/uploads/...
  • Evidence: The bucket was set to public. Anonymous GET on the bucket listing endpoint returned every file path, including filenames containing the original user’s email.
  • Impact: Enumerate every uploaded file. PII leak.
  • Fix: Set the bucket to private. Add storage policies that gate read access on auth.uid() = owner_id. Generate signed URLs server-side with short TTL for legitimate access.

A worked kill chain

This is the kind of multi-step exploit AI pentest agents are increasingly good at composing. The example is anonymized but representative of patterns we keep seeing.

Step 1 — Recon. The agent crawls the SPA and finds a source-map.js.map file in the production bundle. Extracted: a Stripe restricted key with read:customers scope and a JWT signing secret labeled JWT_SECRET.

Step 2 — JWT forgery. The agent decodes a captured user JWT (alg: HS256), confirms the leaked JWT_SECRET matches by re-signing the captured token, and forges a new token with role: admin and a long expiry.

Step 3 — Admin route discovery. The agent walks the SPA’s React Router config (also recovered from the source map) and finds /admin/billing — a route that the unauthenticated app redirects away from. With the forged admin JWT, the route loads.

Step 4 — IDOR on admin billing. The admin route hits GET /api/admin/billing/:userId. The agent enumerates user IDs and discovers the endpoint returns full Stripe customer records — last4 of card, billing address, recent charges.

Step 5 — Data exfiltration. The agent reports the chain end-to-end: leaked secret → forged token → privileged route → enumerable PII. One report, one fix prompt that addresses the root cause (do not ship secrets in source maps), and a linked set of secondary fixes (rotate, RS256, ownership checks).

This is what “AI pentest” means in practice: not a list of unrelated findings, but a story.

Fix prompts

These are copy-pasteable into Claude Code, Cursor, or any agentic coding tool. Each one targets a finding class an AI pentest commonly surfaces.

Fix prompt for missing RLS

The Supabase table `<table_name>` is currently anonymous-readable. Enable Row
Level Security on the table, then write four policies:

1. SELECT: auth.uid() = user_id (only owner reads).
2. INSERT: auth.uid() = user_id (only authenticated users insert as themselves).
3. UPDATE: auth.uid() = user_id (only owner updates).
4. DELETE: auth.uid() = user_id (only owner deletes, if delete is allowed at all).

Generate the SQL migration. Do not break existing service-role access.

Fix prompt for IDOR / BOLA

The endpoint `<METHOD> <route>` returns a resource by ID without checking that
the requesting user owns it. Add an authorization check on the handler:

- Look up the resource.
- If the resource is missing, return 404.
- If the resource exists but does not belong to the requesting user, return
  404 (not 403 — do not leak existence).
- Only on owner match, return the body.

Add a regression test that asserts cross-user access returns 404.

Fix prompt for exposed key

The frontend bundle currently contains `<KEY_NAME>`, which is a server-only
secret. Do these in order:

1. Rotate the key in the provider dashboard. The old one is permanently burned.
2. Remove it from any `.env` file that is read by the client build (Vite
   `VITE_*`, Next.js `NEXT_PUBLIC_*`, etc.).
3. Move the call that uses it to a server-side route or Edge Function.
4. Add a CI check that fails the build if the secret name appears in the
   client bundle output.

Fix prompt for missing signature on webhook

The webhook handler at `<route>` accepts requests without verifying the
provider signature. For provider <provider>, the verification is:

1. Read the raw request body (bytes, not parsed JSON).
2. Read the signature header `<header_name>`.
3. Compute the HMAC of the body with the webhook secret.
4. Reject the request with 401 if the signatures do not match.

Use the provider's official SDK helper if it exists. Add a test that posts
an unsigned payload and asserts a 401 response.

Fix prompt for prompt injection guardrail

The endpoint `<METHOD> <route>` forwards user input to an LLM. Add the
following guardrails:

1. Wrap the user input in a clearly-delimited block (e.g., XML tags) and
   instruct the model to treat its contents as data, not instructions.
2. Run the user input through a classifier (or a cheap model call) that
   flags injection patterns: "ignore previous", "you are now", system-prompt
   leak attempts.
3. Log every invocation with the input, the output, and the classifier
   verdict. Set up an alert on classifier-positive traffic.
4. Constrain the LLM's tool access — never give it write access to anything
   sensitive based on user input alone.

Glossary

  • Agent — a software program that loops observe-decide-act-record using an LLM as the planner and a tool catalog as the action surface.
  • BOLA (Broken Object Level Authorization) — fetching another user’s resource by changing the ID in an API call. The OWASP API Top 10 #1.
  • DAST — dynamic application security testing. Black-box scanning at runtime, signature-driven.
  • IDOR — insecure direct object reference. The classical name for what OWASP API now calls BOLA.
  • JWT — JSON Web Token, a signed token format used for sessions. Frequently exploitable when signed with a leaked secret or alg: none.
  • Kill chain — a chained exploit. Multiple findings composed into a single attack that lands more impact than any one finding alone.
  • PoC — proof of concept. The reproducible request-response evidence that a finding is real.
  • PTaaS — penetration testing as a service. Subscription pentesting on autopilot.
  • Recon — reconnaissance. The phase where the attacker maps the surface before probing.
  • RLS — Row Level Security. Database-layer authorization that scopes every query to the current user.
  • SAST — static application security testing. Source-code analysis without running the app.
  • SSRF — server-side request forgery. Tricking the server into making a request to an internal address.
  • System prompt — the instructions that define the AI agent’s role, scope, and tool catalog.
  • Tool use — the agentic pattern where the LLM emits structured calls to deterministic tools rather than performing actions itself.
  • Validation pass — a second-pass replay that confirms a finding before the agent reports it. The reason AI pentest reports have low false-positive rates.

COMMON QUESTIONS

01
What is AI pentesting?
AI pentesting is penetration testing driven by autonomous software agents rather than a human pentester. The agent performs reconnaissance, tests authentication and authorization, probes API endpoints, attempts input-based attacks (SQL injection, XSS, prompt injection), and reports findings — all without a human in the loop.
Q&A
02
How does AI pentesting work?
An AI pentesting agent is given a target URL and a scope. It loads the app the way a browser would, maps the API surface from captured requests, and then uses a large language model plus structured tool use to decide what to probe next. Each tool (HTTP request, authentication flow, fuzzing) is invoked based on what the agent has observed. Findings are logged with evidence as the agent works.
Q&A
03
Is AI pentesting as good as a human pentester?
For common vulnerability classes — missing auth, broken access control, exposed secrets, injection — AI pentesting matches or exceeds human coverage because it tests exhaustively and never tires. For business-logic flaws that require understanding the purpose of the application, human pentesters are still stronger. The combination is better than either alone.
Q&A
04
When should I use AI pentesting?
Use AI pentesting as continuous coverage — every deploy, every release. It fits the CI/CD cadence in a way that human pentesting cannot. Reserve human pentests for annual engagements, compliance audits, and when you need creative adversarial thinking against a specific business-logic question.
Q&A
05
What does AI pentesting cost?
Significantly less than human pentesting. A human engagement runs $5,000–$50,000 and takes weeks. AI pentesting subscriptions run $20–$500/month and deliver scans in minutes. The VibeEval scanner is free for surface coverage, paid for deep agent-driven testing.
Q&A
06
Can AI pentesting break my production app?
A properly-scoped AI pentest runs non-destructive probes only — it fetches, it queries, it attempts read-level bypasses, but it does not delete, modify, or DoS. You can safely run it against production. If the scanner detects that a destructive action would succeed (for example, an unauthenticated DELETE endpoint), it reports the finding without executing the delete.
Q&A
07
What's the difference between AI pentesting and vulnerability scanning?
Vulnerability scanning matches your app against a known-CVE database and a catalog of default configurations. AI pentesting is adversarial: the agent chains findings, tests business-logic bypasses, and reasons about what to try next based on what it has already learned. Vulnerability scans catch known issues. AI pentests catch the novel composition of them.
Q&A
08
Is AI pentesting replacing human pentesters?
For commodity security testing (scope, recon, common OWASP classes), yes — human effort there is economically irrational when an agent can do it in minutes. For high-stakes, creative, business-logic-heavy pentests, humans remain essential. The jobs are changing, not disappearing.
Q&A
09
What does the agent's reasoning loop actually look like?
The agent runs an observe-decide-act-record cycle. Observe: pull the latest HTTP responses, JS bundle, captured headers. Decide: ask the LLM what attack to try next given that observation set. Act: invoke a deterministic tool (HTTP request, JWT decoder, fuzzer) and capture the result. Record: append the finding plus full request/response evidence to the report. The loop terminates when the planner reports the scope is exhausted or the budget runs out.
Q&A
10
Why use a large language model at all instead of a static rule engine?
Static rule engines (DAST, vulnerability scanners) cover known patterns — they regex-match payloads against known signatures. They do not chain findings, do not adapt payloads to the specific app, and do not reason about business logic. The LLM brings adaptive planning: it sees the app's specific routes, framework, and prior failures, and picks the next probe based on that context. It is the difference between a checklist and an attacker.
Q&A
11
Can AI pentesting find prompt injection in my LLM-backed features?
Yes. Any endpoint that forwards user input to an LLM is a prompt injection target. The agent fuzzes those endpoints with delimiter-escape payloads, system-prompt overrides, indirect injection via stored data, and tool-poisoning patterns. Findings are reported with the exact payload that succeeded and the model output that proves the bypass.
Q&A
12
What's the false positive rate?
Modern AI pentest agents validate every finding with a proof-of-concept replay before reporting it — the finding is only emitted if the same probe reproduces the vulnerability on a clean session. This drives false positives close to zero on the categories the agent covers. The trade-off is runtime: the validation pass adds seconds per finding, which is acceptable for a per-deploy scan.
Q&A
13
How does an AI pentest agent handle authentication?
Two modes. Unauthenticated: the agent runs as an anonymous user and tests every endpoint that responds. Authenticated: you provide test credentials (or two sets to test cross-tenant), and the agent logs in, captures the session token, and replays the same scan as a logged-in user — including cross-user IDOR/BOLA tests using the second account.
Q&A
14
What's the kill chain in an AI pentest report?
A kill chain is a chained vulnerability story — for example, exposed JWT secret → forge admin token → access the admin route → read all users → exfiltrate. The AI agent stitches multi-step exploits when individual findings compose. The report shows each link, the request that proved it, and the impact at each step.
Q&A
15
Can AI pentesting cover GraphQL and WebSocket APIs?
Yes. The agent introspects GraphQL schemas (when introspection is enabled) and enumerates types, queries, and mutations. For WebSocket, it captures the upgrade handshake and the message protocol, then probes for missing auth on subscribe events and broadcast leaks. Coverage parity with REST is the goal.
Q&A
16
What does human-in-the-loop mode mean?
Human-in-the-loop is a configuration where the agent pauses before destructive or sensitive probes (login flooding, password reset abuse, anything that could trigger lockouts or send real emails) and waits for explicit approval. It is the right mode for production scans where the customer wants every probe pre-authorized.
Q&A

RUN AN AI PENTEST

14-day trial. No card. Full agent-driven scan on your deployed URL in under 60 seconds.

START FREE SCAN