AI VULNERABILITY ASSESSMENT: AUTOMATED DETECTION & PRIORITIZATION | VIBEEVAL

AI vulnerability assessment is the layer between a scanner that lists 200 maybes and a pentester that delivers a PDF in three weeks. Autonomous agents validate each finding by attempting safe exploitation, rank by real-world impact, and write the fix prompt. What ships is a triaged backlog, not a noise feed.

Not All Vulnerabilities Are Equal

AI assessment distinguishes between theoretical risks and actually exploitable weaknesses, so you fix what matters first.

Assessment vs scan vs pentest

The three terms get conflated. They are not the same thing.

Activity What it does What it produces Cadence
Vulnerability scan Signature match against known CVEs and patterns A long list with many false positives Continuous
Vulnerability assessment Discovery + exploitation-validated findings + prioritization A short, ranked, triaged backlog Continuous
Penetration test Human-led exploitation, chained attacks, business logic A report with narrative, proofs, and consultant sign-off Annual or per-engagement

A scanner answers “what looks suspicious.” An assessment answers “what is actually broken.” A pentest answers “what would a determined human do, and how do they chain things together.”

AI assessment is the missing middle. It runs as often as a scanner, validates like a pentester, and ships a backlog short enough to actually triage. For a deeper comparison see Vulnerability Scanning vs AI Pentest and AI Pentest vs Traditional.

Vulnerability Assessment Checklist

Follow these 8 steps for comprehensive AI-powered vulnerability assessment. Critical items ensure accurate detection and prioritization.

  1. Configure scan targets — define applications, APIs, and infrastructure endpoints in scope. Mark public versus authenticated surface separately.
  2. Run comprehensive vulnerability scan — execute a full-scope AI-powered scan covering OWASP Top 10, business logic, and infrastructure.
  3. Analyze severity classifications — review AI-assigned severity based on CVSS, attack complexity, and business impact.
  4. Verify exploitability — let the agent validate each finding by attempting non-destructive exploitation to confirm real-world risk.
  5. Prioritize by business impact — rank confirmed vulnerabilities by potential damage to your business, data, and users.
  6. Generate remediation plan — get fix guidance with code examples, configuration changes, and implementation steps.
  7. Implement fixes — apply remediations starting with the highest-priority vulnerabilities.
  8. Verify remediation success — re-scan fixed targets to confirm closure and detect regressions.

Benefits of AI Vulnerability Assessment

Zero False Positives with AI Verification

Every finding is validated through attempted exploitation, eliminating noise from theoretical vulnerabilities.

Prioritizes by Real Exploitability

AI ranks vulnerabilities by actual attack feasibility, not just CVSS scores, so you fix what matters.

Covers Application and Infrastructure

A single assessment covers web apps, APIs, cloud configs, and infrastructure in one comprehensive run.

Generates Actionable Fix Guidance

Each finding includes specific remediation steps with code examples tailored to your tech stack.

How AI Vulnerability Assessment Differs from Scanning

Traditional vulnerability scanners run signature-based checks against known CVE databases. They are good at finding outdated libraries and missing patches, but they cannot understand application logic. A scanner might flag 200 “potential” XSS issues when only three are actually exploitable.

AI vulnerability assessment goes deeper. Instead of pattern matching, AI agents actually attempt to exploit each finding. They inject real payloads, verify whether the injection executes, and document the exact attack chain. This eliminates false positives entirely — if the AI cannot exploit it, it does not report it.

The prioritization layer is where AI truly shines. Instead of ranking by CVSS score alone (which treats all “Critical” findings equally), AI considers exploitability, business impact, and attack surface exposure. A SQL injection in a public-facing search endpoint is far more dangerous than one in an internal admin tool with IP restrictions.

Severity scoring methodology

CVSS gives you a vector. It does not give you a priority. Two findings with identical CVSS scores can have wildly different real-world impact. The assessment combines three layers:

Layer What it measures How AI uses it
Base CVSS (v3.1) Inherent technical severity Starting point for the score
Exploitability Did the agent actually trigger it? Required for the finding to ship at all
Business context Public vs authenticated, data class touched, regulated content Adjusts severity up or down

A SQL injection on /api/search (unauthenticated, hits the user table) is treated as Critical even if CVSS is 7.5. A SQL injection on /internal/admin/health reachable only from a corporate IP allowlist is treated as Medium even if CVSS is 9.8. The agent reads the surrounding behavior — does the response include user emails, does the endpoint accept anonymous traffic, is rate limiting present — and adjusts.

For deeper severity guidance see the Penetration Testing Guide and the Manual Security Testing reference.

Vulnerability Severity Framework

Critical — Exploitable Without Authentication

Leads to data breach or system compromise. Examples: SQL injection on public endpoints, RCE, exposed admin panels without auth, missing RLS on a Supabase table that holds PII. Fix immediately — these are actively exploited in the wild.

High — Requires Authentication, Significant Impact

Leads to significant data exposure or privilege escalation. Examples: BOLA / IDOR allowing access to other users’ data, stored XSS in user-generated content, JWT signing key exposure, mass-assignment on a profile-update endpoint. Fix within 24 to 48 hours.

Medium — Limited Exploitation Potential

Requires specific conditions to exploit. Examples: CSRF on non-critical forms, missing security headers, verbose error messages leaking stack traces, weak password policy. Fix within 1 to 2 weeks.

Low — Informational Findings

Improves security posture but is not directly exploitable. Examples: outdated but non-vulnerable dependencies, suboptimal CSP configuration, missing HSTS preload. Fix in next sprint.

Anonymized findings from real assessments

These examples are anonymized from apps we audit. They show how the same nominal CVSS produces different severities once business context is applied.

Finding 1 — Critical: missing RLS on messages table

  • Endpoint: GET /rest/v1/messages?select=* on a Supabase project
  • Evidence: Anonymous request with the public anon key returned every direct message in the database
  • CVSS base: 7.5 (network, low complexity, no auth)
  • Adjusted severity: Critical — the table holds private user-to-user content, the endpoint is reachable from the open internet, and there is no rate limit
  • Fix: Enable RLS on messages and add auth.uid() = sender_id OR auth.uid() = recipient_id as the read policy. See Supabase RLS Checker.

Finding 2 — High: BOLA on /api/orders/{id}

  • Endpoint: GET /api/orders/{id} on a Cursor-built Express API
  • Evidence: Authenticated as user A, requesting user B’s order ID returned the full order including shipping address and last-four card digits
  • CVSS base: 7.1
  • Adjusted severity: High — requires auth, but auth is open registration, and the data class is PII plus partial PCI
  • Fix: Add WHERE user_id = $auth_user_id to the query, or wrap the route in an ownership check middleware

Finding 3 — Medium: stack trace leaks framework version

  • Endpoint: POST /api/checkout returns 500 with full Python traceback
  • Evidence: Trace exposes Django version, file paths, and one environment variable name
  • CVSS base: 5.3
  • Adjusted severity: Medium — useful to an attacker for fingerprinting, no direct data exposure
  • Fix: Set DEBUG=False in production, add a generic error handler

Finding 4 — Critical: API key in frontend bundle

  • Endpoint: GET /assets/main.js from a Lovable deploy
  • Evidence: Server-side OpenAI key (sk-proj-…) embedded in the bundle, valid against the OpenAI API
  • CVSS base: 9.1
  • Adjusted severity: Critical — direct billing exposure, no auth needed to extract, the key is provider-rotatable but not project-scoped
  • Fix: Move the call behind a Supabase Edge Function or a server route. Rotate the key immediately. See Token Leak Checker.

Finding 5 — High: mass assignment on profile update

  • Endpoint: PATCH /api/users/me
  • Evidence: Sending {"role": "admin"} in the request body promoted the test user to admin. The route accepted arbitrary keys and forwarded them to the ORM
  • CVSS base: 8.1
  • Adjusted severity: High — the admin role unlocks every privileged endpoint
  • Fix: Add an explicit allowlist of mutable fields: name, avatar_url, bio. Reject everything else.

Finding 6 — Low: missing security headers

  • Endpoint: GET / (root document)
  • Evidence: No CSP, no HSTS, X-Frame-Options absent
  • CVSS base: 3.1
  • Adjusted severity: Low — no direct exploit, but lowers the cost of a chained attack
  • Fix: Configure headers at the hosting provider (Vercel, Netlify) or via middleware. See Security Headers Checker.

How AI agents prioritize findings

Prioritization is the part developers care about. The agent ranks findings using this decision tree:

  1. Was the exploit successful? If no, drop the finding. (This alone removes most false positives.)
  2. Is the endpoint public? Public-internet reachability bumps severity. Authentication-gated bumps severity less. Internal-only endpoints behind a VPN drop severity unless the agent finds a way through.
  3. What data class is exposed? PII, payment data, auth tokens, and admin functions outrank logs, public catalog content, and static metadata.
  4. Is the exploit a stepping stone? A finding that enables another finding (XSS that steals an admin session token, then uses it on a privileged endpoint) is reported as a chain and severity is elevated.
  5. Does the platform expose blast radius? Supabase service-role keys, AWS root credentials, and similar findings are flagged Critical regardless of CVSS because the blast radius is the entire account.

The chained-vulnerability case is where AI assessment beats both scanners and most one-shot pentest tools. The agent keeps state across the run and notices that finding 7 (reflected XSS on a contact form) plus finding 12 (admin route relies on cookie-only auth) are exploitable together as account takeover.

False-positive elimination via PoC validation

Traditional scanners report a candidate when a payload “looks like” it might be reflected. AI assessment reports a candidate when the payload was reflected and executed. The validation steps:

  • For XSS: inject a benign payload (e.g. a unique attribute that triggers a known DOM event), confirm it renders and executes, capture the rendered HTML as evidence
  • For SQL injection: send a payload that produces a measurable, non-destructive side effect (e.g. a controlled time delay) and verify the response time matches
  • For BOLA: authenticate as user A, request user B’s resource, verify the response body contains user B’s data
  • For SSRF: route the payload to a controlled domain, verify the request landed there
  • For RCE: execute a benign command that returns a known value, verify the value appears in the response

Findings that fail validation are not reported. Findings that pass ship with the request and response captured as evidence.

Integration into vulnerability-management workflows

The output is designed to drop into whatever tracking system you already use:

  • JSON export for ingestion into Jira, Linear, GitHub Issues
  • SARIF export for GitHub Code Scanning and Azure DevOps
  • PDF export for compliance evidence (Compliance Penetration Testing)
  • Webhooks to fire on new Critical or High findings
  • MCP integration with Claude Code so the assistant can read findings, generate fixes, and open PRs

The pattern that works in practice: assessment runs on every deploy, Critical and High findings page the on-call engineer, Medium and Low findings appear as auto-created tickets, fix prompts are pre-written for the team to paste into Cursor or Claude Code.

From Assessment to Remediation

AI vulnerability assessment does not stop at finding problems. For each vulnerability the agent generates specific remediation guidance: the exact code change, the configuration to update, or the library to upgrade. For Supabase RLS issues it generates the policy SQL. For missing auth middleware it generates the middleware code.

VibeEval’s MCP integration takes this further. Connected to Claude Code, the AI can automatically open pull requests that fix findings. The self-healing loop means your security posture improves continuously without manual intervention: scan, find, fix, verify, repeat.

Fix prompts you can paste

Drop these into Cursor, Claude Code, or Lovable to remediate the most common findings.

Missing RLS on a Supabase table:

Add Row Level Security to the {table_name} table in Supabase.
The table has columns: {columns}. The owner column is {owner_column}.
Generate the SQL to enable RLS, then add SELECT, INSERT, UPDATE, DELETE
policies that require auth.uid() = {owner_column}. Include a separate
policy for service-role access. Output the SQL only.

BOLA on a REST endpoint:

The route {METHOD} {path} accepts a resource ID and returns the
resource without checking ownership. Add an ownership check that
verifies the authenticated user owns the resource before returning it.
If the user does not own it, return 404 (not 403, to avoid leaking
existence). Show me the diff.

Mass assignment on a profile update:

The handler for PATCH /api/users/me accepts an arbitrary request body
and forwards it to the ORM. Refactor it to accept only the allowlisted
fields: name, avatar_url, bio. Reject any other field with a 400 error.
Use a typed input shape so unknown fields are dropped at parse time.

Server key in the frontend bundle:

The OpenAI API key is currently bundled into the frontend JavaScript.
Move all OpenAI calls behind a Supabase Edge Function. The frontend
should call the Edge Function with the user's auth token; the Edge
Function reads OPENAI_API_KEY from environment and forwards. Generate
the Edge Function code and update the frontend client.

When you actually need a human assessor instead

AI assessment is the right default for most apps. It is not the right answer for everything.

  • Regulated workloads where an auditor demands signed pentest reports — SOC 2 Type II, HIPAA, PCI-DSS Level 1 — see Compliance Penetration Testing
  • Apps with unusual business logic the agent cannot frame without human context (multi-party financial flows, complex permission matrices that depend on tenant configuration)
  • Hardware, firmware, or embedded targets where the AI agent does not have first-class instrumentation
  • Red-team engagements that include physical, social, and OSINT components alongside the technical pentest

The standard pattern: AI assessment continuously, plus one human engagement annually for the regulated cases. See Manual Security Testing.

Start Your Vulnerability Assessment

VibeEval’s AI-powered vulnerability assessment finds real, exploitable vulnerabilities and gives you a prioritized remediation plan. No false positives, no wasted time.

COMMON QUESTIONS

01
What is AI vulnerability assessment?
AI vulnerability assessment is automated discovery, validation, and prioritization of security weaknesses in an application using autonomous agents. Unlike a signature-based scanner, an AI assessment attempts safe exploitation against each candidate finding and only reports vulnerabilities it can actually trigger. The output is a ranked list of exploitable issues with evidence and a fix prompt, not a raw scanner dump.
Q&A
02
How is an assessment different from a scan or a pentest?
A scan is signature matching against known patterns and CVE databases. A pentest is a human engagement with manual exploitation, business logic testing, and a written report. An AI vulnerability assessment sits between them: machine-speed coverage like a scanner, exploitation-validated findings and prioritization like a pentest, available continuously rather than annually.
Q&A
03
How does the AI prioritize findings?
Three signals. First, exploitability: did the agent actually trigger the vulnerability, or is it theoretical? Second, business impact: does the affected endpoint touch user data, billing, auth, or admin? Third, exposure: is it reachable from the public internet, or behind auth and IP allowlists? CVSS is one input but not the final word — a 7.5 SQL injection on a public search endpoint outranks a 9.8 issue in an internal admin tool with IP restrictions.
Q&A
04
How are false positives eliminated?
Every candidate finding is validated by a non-destructive proof-of-concept. If the agent cannot trigger the bug, it is not reported. This collapses the noise that traditional scanners produce — the 200 'potential' XSS findings where only three are real — into a list short enough that a developer reads every entry.
Q&A
05
Does it work without source-code access?
Yes. The agent runs against a deployed URL and tests like an external attacker. Source code is helpful for some classes (dependency CVEs, hard-coded secrets in non-frontend code) but not required for the majority of high-severity findings, which are runtime behaviors observable from outside.
Q&A
06
Can the assessment run continuously?
Yes. The standard cadence is on every deploy. The full assessment finishes in minutes, so wiring it into CI/CD or running it on a schedule is the normal way to use it. Findings that appeared yesterday and disappeared today are tracked as resolved automatically.
Q&A
07
What does the report look like?
Each finding has a title, severity, CVSS vector and adjusted score, the affected endpoint, the request and response that triggered it, exploitation notes, business-impact reasoning, and a copy-paste-ready fix prompt for Cursor, Claude Code, or Lovable. The report exports as JSON, PDF, and SARIF for ingestion into your existing vulnerability-management workflow.
Q&A
08
When do I still need a human assessor?
When the workload is regulated and an auditor demands a signed report (HIPAA, SOC 2 Type II, PCI), or when the application has unusual business logic that an AI agent cannot reason about without human framing. Continuous AI assessment plus an annual human engagement is the standard pattern.
Q&A

SCAN YOUR APP

14-day trial. No card. Results in under 60 seconds.

START FREE SCAN