AI VULNERABILITY ASSESSMENT: AUTOMATED DETECTION & PRIORITIZATION | VIBEEVAL
AI vulnerability assessment is the layer between a scanner that lists 200 maybes and a pentester that delivers a PDF in three weeks. Autonomous agents validate each finding by attempting safe exploitation, rank by real-world impact, and write the fix prompt. What ships is a triaged backlog, not a noise feed.
Not All Vulnerabilities Are Equal
AI assessment distinguishes between theoretical risks and actually exploitable weaknesses, so you fix what matters first.
Assessment vs scan vs pentest
The three terms get conflated. They are not the same thing.
| Activity | What it does | What it produces | Cadence |
|---|---|---|---|
| Vulnerability scan | Signature match against known CVEs and patterns | A long list with many false positives | Continuous |
| Vulnerability assessment | Discovery + exploitation-validated findings + prioritization | A short, ranked, triaged backlog | Continuous |
| Penetration test | Human-led exploitation, chained attacks, business logic | A report with narrative, proofs, and consultant sign-off | Annual or per-engagement |
A scanner answers “what looks suspicious.” An assessment answers “what is actually broken.” A pentest answers “what would a determined human do, and how do they chain things together.”
AI assessment is the missing middle. It runs as often as a scanner, validates like a pentester, and ships a backlog short enough to actually triage. For a deeper comparison see Vulnerability Scanning vs AI Pentest and AI Pentest vs Traditional.
Vulnerability Assessment Checklist
Follow these 8 steps for comprehensive AI-powered vulnerability assessment. Critical items ensure accurate detection and prioritization.
- Configure scan targets — define applications, APIs, and infrastructure endpoints in scope. Mark public versus authenticated surface separately.
- Run comprehensive vulnerability scan — execute a full-scope AI-powered scan covering OWASP Top 10, business logic, and infrastructure.
- Analyze severity classifications — review AI-assigned severity based on CVSS, attack complexity, and business impact.
- Verify exploitability — let the agent validate each finding by attempting non-destructive exploitation to confirm real-world risk.
- Prioritize by business impact — rank confirmed vulnerabilities by potential damage to your business, data, and users.
- Generate remediation plan — get fix guidance with code examples, configuration changes, and implementation steps.
- Implement fixes — apply remediations starting with the highest-priority vulnerabilities.
- Verify remediation success — re-scan fixed targets to confirm closure and detect regressions.
Benefits of AI Vulnerability Assessment
Zero False Positives with AI Verification
Every finding is validated through attempted exploitation, eliminating noise from theoretical vulnerabilities.
Prioritizes by Real Exploitability
AI ranks vulnerabilities by actual attack feasibility, not just CVSS scores, so you fix what matters.
Covers Application and Infrastructure
A single assessment covers web apps, APIs, cloud configs, and infrastructure in one comprehensive run.
Generates Actionable Fix Guidance
Each finding includes specific remediation steps with code examples tailored to your tech stack.
How AI Vulnerability Assessment Differs from Scanning
Traditional vulnerability scanners run signature-based checks against known CVE databases. They are good at finding outdated libraries and missing patches, but they cannot understand application logic. A scanner might flag 200 “potential” XSS issues when only three are actually exploitable.
AI vulnerability assessment goes deeper. Instead of pattern matching, AI agents actually attempt to exploit each finding. They inject real payloads, verify whether the injection executes, and document the exact attack chain. This eliminates false positives entirely — if the AI cannot exploit it, it does not report it.
The prioritization layer is where AI truly shines. Instead of ranking by CVSS score alone (which treats all “Critical” findings equally), AI considers exploitability, business impact, and attack surface exposure. A SQL injection in a public-facing search endpoint is far more dangerous than one in an internal admin tool with IP restrictions.
Severity scoring methodology
CVSS gives you a vector. It does not give you a priority. Two findings with identical CVSS scores can have wildly different real-world impact. The assessment combines three layers:
| Layer | What it measures | How AI uses it |
|---|---|---|
| Base CVSS (v3.1) | Inherent technical severity | Starting point for the score |
| Exploitability | Did the agent actually trigger it? | Required for the finding to ship at all |
| Business context | Public vs authenticated, data class touched, regulated content | Adjusts severity up or down |
A SQL injection on /api/search (unauthenticated, hits the user table) is treated as Critical even if CVSS is 7.5. A SQL injection on /internal/admin/health reachable only from a corporate IP allowlist is treated as Medium even if CVSS is 9.8. The agent reads the surrounding behavior — does the response include user emails, does the endpoint accept anonymous traffic, is rate limiting present — and adjusts.
For deeper severity guidance see the Penetration Testing Guide and the Manual Security Testing reference.
Vulnerability Severity Framework
Critical — Exploitable Without Authentication
Leads to data breach or system compromise. Examples: SQL injection on public endpoints, RCE, exposed admin panels without auth, missing RLS on a Supabase table that holds PII. Fix immediately — these are actively exploited in the wild.
High — Requires Authentication, Significant Impact
Leads to significant data exposure or privilege escalation. Examples: BOLA / IDOR allowing access to other users’ data, stored XSS in user-generated content, JWT signing key exposure, mass-assignment on a profile-update endpoint. Fix within 24 to 48 hours.
Medium — Limited Exploitation Potential
Requires specific conditions to exploit. Examples: CSRF on non-critical forms, missing security headers, verbose error messages leaking stack traces, weak password policy. Fix within 1 to 2 weeks.
Low — Informational Findings
Improves security posture but is not directly exploitable. Examples: outdated but non-vulnerable dependencies, suboptimal CSP configuration, missing HSTS preload. Fix in next sprint.
Anonymized findings from real assessments
These examples are anonymized from apps we audit. They show how the same nominal CVSS produces different severities once business context is applied.
Finding 1 — Critical: missing RLS on messages table
- Endpoint:
GET /rest/v1/messages?select=*on a Supabase project - Evidence: Anonymous request with the public anon key returned every direct message in the database
- CVSS base: 7.5 (network, low complexity, no auth)
- Adjusted severity: Critical — the table holds private user-to-user content, the endpoint is reachable from the open internet, and there is no rate limit
- Fix: Enable RLS on
messagesand addauth.uid() = sender_id OR auth.uid() = recipient_idas the read policy. See Supabase RLS Checker.
Finding 2 — High: BOLA on /api/orders/{id}
- Endpoint:
GET /api/orders/{id}on a Cursor-built Express API - Evidence: Authenticated as user A, requesting user B’s order ID returned the full order including shipping address and last-four card digits
- CVSS base: 7.1
- Adjusted severity: High — requires auth, but auth is open registration, and the data class is PII plus partial PCI
- Fix: Add
WHERE user_id = $auth_user_idto the query, or wrap the route in an ownership check middleware
Finding 3 — Medium: stack trace leaks framework version
- Endpoint:
POST /api/checkoutreturns 500 with full Python traceback - Evidence: Trace exposes Django version, file paths, and one environment variable name
- CVSS base: 5.3
- Adjusted severity: Medium — useful to an attacker for fingerprinting, no direct data exposure
- Fix: Set
DEBUG=Falsein production, add a generic error handler
Finding 4 — Critical: API key in frontend bundle
- Endpoint:
GET /assets/main.jsfrom a Lovable deploy - Evidence: Server-side OpenAI key (sk-proj-…) embedded in the bundle, valid against the OpenAI API
- CVSS base: 9.1
- Adjusted severity: Critical — direct billing exposure, no auth needed to extract, the key is provider-rotatable but not project-scoped
- Fix: Move the call behind a Supabase Edge Function or a server route. Rotate the key immediately. See Token Leak Checker.
Finding 5 — High: mass assignment on profile update
- Endpoint:
PATCH /api/users/me - Evidence: Sending
{"role": "admin"}in the request body promoted the test user to admin. The route accepted arbitrary keys and forwarded them to the ORM - CVSS base: 8.1
- Adjusted severity: High — the admin role unlocks every privileged endpoint
- Fix: Add an explicit allowlist of mutable fields:
name,avatar_url,bio. Reject everything else.
Finding 6 — Low: missing security headers
- Endpoint:
GET /(root document) - Evidence: No CSP, no HSTS, X-Frame-Options absent
- CVSS base: 3.1
- Adjusted severity: Low — no direct exploit, but lowers the cost of a chained attack
- Fix: Configure headers at the hosting provider (Vercel, Netlify) or via middleware. See Security Headers Checker.
How AI agents prioritize findings
Prioritization is the part developers care about. The agent ranks findings using this decision tree:
- Was the exploit successful? If no, drop the finding. (This alone removes most false positives.)
- Is the endpoint public? Public-internet reachability bumps severity. Authentication-gated bumps severity less. Internal-only endpoints behind a VPN drop severity unless the agent finds a way through.
- What data class is exposed? PII, payment data, auth tokens, and admin functions outrank logs, public catalog content, and static metadata.
- Is the exploit a stepping stone? A finding that enables another finding (XSS that steals an admin session token, then uses it on a privileged endpoint) is reported as a chain and severity is elevated.
- Does the platform expose blast radius? Supabase service-role keys, AWS root credentials, and similar findings are flagged Critical regardless of CVSS because the blast radius is the entire account.
The chained-vulnerability case is where AI assessment beats both scanners and most one-shot pentest tools. The agent keeps state across the run and notices that finding 7 (reflected XSS on a contact form) plus finding 12 (admin route relies on cookie-only auth) are exploitable together as account takeover.
False-positive elimination via PoC validation
Traditional scanners report a candidate when a payload “looks like” it might be reflected. AI assessment reports a candidate when the payload was reflected and executed. The validation steps:
- For XSS: inject a benign payload (e.g. a unique attribute that triggers a known DOM event), confirm it renders and executes, capture the rendered HTML as evidence
- For SQL injection: send a payload that produces a measurable, non-destructive side effect (e.g. a controlled time delay) and verify the response time matches
- For BOLA: authenticate as user A, request user B’s resource, verify the response body contains user B’s data
- For SSRF: route the payload to a controlled domain, verify the request landed there
- For RCE: execute a benign command that returns a known value, verify the value appears in the response
Findings that fail validation are not reported. Findings that pass ship with the request and response captured as evidence.
Integration into vulnerability-management workflows
The output is designed to drop into whatever tracking system you already use:
- JSON export for ingestion into Jira, Linear, GitHub Issues
- SARIF export for GitHub Code Scanning and Azure DevOps
- PDF export for compliance evidence (Compliance Penetration Testing)
- Webhooks to fire on new Critical or High findings
- MCP integration with Claude Code so the assistant can read findings, generate fixes, and open PRs
The pattern that works in practice: assessment runs on every deploy, Critical and High findings page the on-call engineer, Medium and Low findings appear as auto-created tickets, fix prompts are pre-written for the team to paste into Cursor or Claude Code.
From Assessment to Remediation
AI vulnerability assessment does not stop at finding problems. For each vulnerability the agent generates specific remediation guidance: the exact code change, the configuration to update, or the library to upgrade. For Supabase RLS issues it generates the policy SQL. For missing auth middleware it generates the middleware code.
VibeEval’s MCP integration takes this further. Connected to Claude Code, the AI can automatically open pull requests that fix findings. The self-healing loop means your security posture improves continuously without manual intervention: scan, find, fix, verify, repeat.
Fix prompts you can paste
Drop these into Cursor, Claude Code, or Lovable to remediate the most common findings.
Missing RLS on a Supabase table:
Add Row Level Security to the {table_name} table in Supabase.
The table has columns: {columns}. The owner column is {owner_column}.
Generate the SQL to enable RLS, then add SELECT, INSERT, UPDATE, DELETE
policies that require auth.uid() = {owner_column}. Include a separate
policy for service-role access. Output the SQL only.
BOLA on a REST endpoint:
The route {METHOD} {path} accepts a resource ID and returns the
resource without checking ownership. Add an ownership check that
verifies the authenticated user owns the resource before returning it.
If the user does not own it, return 404 (not 403, to avoid leaking
existence). Show me the diff.
Mass assignment on a profile update:
The handler for PATCH /api/users/me accepts an arbitrary request body
and forwards it to the ORM. Refactor it to accept only the allowlisted
fields: name, avatar_url, bio. Reject any other field with a 400 error.
Use a typed input shape so unknown fields are dropped at parse time.
Server key in the frontend bundle:
The OpenAI API key is currently bundled into the frontend JavaScript.
Move all OpenAI calls behind a Supabase Edge Function. The frontend
should call the Edge Function with the user's auth token; the Edge
Function reads OPENAI_API_KEY from environment and forwards. Generate
the Edge Function code and update the frontend client.
When you actually need a human assessor instead
AI assessment is the right default for most apps. It is not the right answer for everything.
- Regulated workloads where an auditor demands signed pentest reports — SOC 2 Type II, HIPAA, PCI-DSS Level 1 — see Compliance Penetration Testing
- Apps with unusual business logic the agent cannot frame without human context (multi-party financial flows, complex permission matrices that depend on tenant configuration)
- Hardware, firmware, or embedded targets where the AI agent does not have first-class instrumentation
- Red-team engagements that include physical, social, and OSINT components alongside the technical pentest
The standard pattern: AI assessment continuously, plus one human engagement annually for the regulated cases. See Manual Security Testing.
Related Resources
- AI Penetration Testing Guide — the comprehensive methodology
- Vulnerability Scanning vs AI Pentest — why scanners are not enough
- AI Pentest vs Traditional — cost, speed, and coverage tradeoffs
- AI Security Audit for Startups — affordable assessment for early-stage teams
- Compliance Penetration Testing — SOC 2, GDPR, HIPAA evidence packages
- Continuous Penetration Testing — every-deploy cadence
- Vibe Code Scanner — run a free assessment in 60 seconds
- Supabase RLS Checker — focused tool for the most common Critical
- Firebase Scanner — Firestore Security Rules audit
- Token Leak Checker — find exposed keys in the bundle
- Security Headers Checker — header hygiene
- Backend Security Hub — Postgres, Supabase, Firebase hardening
Start Your Vulnerability Assessment
VibeEval’s AI-powered vulnerability assessment finds real, exploitable vulnerabilities and gives you a prioritized remediation plan. No false positives, no wasted time.
COMMON QUESTIONS
SCAN YOUR APP
14-day trial. No card. Results in under 60 seconds.