HOW SECURE IS AN AI-GENERATED APP? 2026 BENCHMARK
We scanned 1,500+ apps built with Lovable, Bolt, Cursor, Replit, and V0. Eighty-one percent shipped with at least one critical or high-severity issue. Here is the full breakdown — per platform, per category, mapped to OWASP.
This is the first dataset of its kind we are aware of: a uniform vulnerability scan run against 1,500+ live applications built on Lovable, Bolt.new, Cursor, Replit, and V0, scored on the same rubric, mapped to the same taxonomy. The headline number is the same number you should expect to find in your own AI-built app: most ship with at least one critical or high finding.
If you are a builder, a journalist, or an AppSec researcher, the tables below are citation-grade. The methodology section explains how to reproduce the numbers against any URL.
Headline numbers
| Metric | Value |
|---|---|
| Apps scanned | 1,514 |
| Window | Nov 2025 – Apr 2026 |
| Apps with at least one critical | 47% |
| Apps with at least one high or critical | 81% |
| Median findings per app | 7 |
| Average time to first proven finding | 58 seconds |
Per-platform breakdown
Critical and high rates by platform. Each row is the share of apps on that platform that shipped with at least one finding at the listed severity.
| Platform | Apps in sample | Critical rate | High+ rate | Top finding |
|---|---|---|---|---|
| Lovable | 612 | 58% | 91% | Missing or broken Supabase RLS |
| Bolt.new | 318 | 49% | 84% | Hardcoded secrets in client bundle |
| Cursor | 246 | 41% | 78% | Broken object-level auth (BOLA) |
| Replit | 201 | 44% | 79% | Public .env exposure on default deployments |
| V0 | 137 | 24% | 61% | Unauthenticated API routes generated alongside components |
Lovable’s higher rate is structural, not incidental — see the FAQ. V0’s lower rate reflects that V0 apps typically outsource their backend; the underlying Supabase or Convex backend then carries the same risks measured separately.
Top 10 vulnerabilities across all platforms
Counts are per finding, not per app — one app can contribute to multiple categories.
| Rank | Category | OWASP mapping | Apps affected | Share |
|---|---|---|---|---|
| 1 | Missing or broken Row Level Security | API1 BOLA, API3 BOPLA | 891 | 59% |
| 2 | Hardcoded secrets in frontend bundle | A02 Cryptographic Failures | 614 | 41% |
| 3 | Broken object-level authorization (BOLA) | API1 BOLA | 487 | 32% |
| 4 | Missing rate limiting on auth and write endpoints | API4 Unrestricted Resource Consumption | 392 | 26% |
| 5 | CORS allow-all on credentialed endpoints | A05 Security Misconfiguration | 351 | 23% |
| 6 | Self-editable role or permission fields | API5 BFLA | 309 | 20% |
| 7 | SSRF via user-supplied URLs in upload or import flows | A10 SSRF | 184 | 12% |
| 8 | Verbose error responses leaking stack traces | A09 Logging Failures | 171 | 11% |
| 9 | Open redirects in auth callback handlers | A01 Broken Access Control | 142 | 9% |
| 10 | Outdated dependencies with known critical CVEs | A06 Vulnerable Components | 128 | 8% |
The top three account for two-thirds of all findings. Any single one of them is sufficient to leak every user’s data.
CWE / OWASP mapping for the top 10
The OWASP column in the table above is one mapping per row; in practice each finding usually carries two or three CWE codes. The expanded mapping below is the canonical one we tag findings against.
| Rank | Category | OWASP API | OWASP Web | OWASP LLM | Primary CWE | Secondary CWE |
|---|---|---|---|---|---|---|
| 1 | Missing or broken RLS | API1 BOLA · API3 BOPLA | A01 Broken Access Control | — | CWE-862 Missing Authorization | CWE-863 Incorrect Authorization |
| 2 | Hardcoded secrets in frontend bundle | API8 Security Misconfiguration | A02 Cryptographic Failures · A05 | LLM07 System Prompt Leakage | CWE-798 Hard-coded Credentials | CWE-200 Sensitive Info Exposure |
| 3 | BOLA | API1 BOLA | A01 Broken Access Control | — | CWE-639 Auth Bypass via Key | CWE-284 Improper Access Control |
| 4 | Missing rate limiting | API4 Unrestricted Resource Consumption | A05 Security Misconfiguration | LLM10 Unbounded Consumption | CWE-770 Allocation w/o Limits | CWE-307 Improper Restriction of Auth Attempts |
| 5 | CORS allow-all on credentialed endpoints | API8 Security Misconfiguration | A05 Security Misconfiguration | — | CWE-942 Permissive Cross-domain Policy | CWE-346 Origin Validation Error |
| 6 | Self-editable role / mass assignment | API5 BFLA · API6 Mass Assignment | A04 Insecure Design | — | CWE-915 Mass Assignment | CWE-863 Incorrect Authorization |
| 7 | SSRF in upload / import flows | API7 Server Side Request Forgery | A10 SSRF | — | CWE-918 SSRF | CWE-441 Confused Deputy |
| 8 | Verbose error responses | API8 Security Misconfiguration | A09 Logging Failures · A05 | — | CWE-209 Info Exposure via Error | CWE-200 |
| 9 | Open redirects in auth callbacks | API8 Security Misconfiguration | A01 Broken Access Control | — | CWE-601 URL Redirect to Untrusted | CWE-639 |
| 10 | Outdated dependencies with known CVEs | API8 Security Misconfiguration | A06 Vulnerable Components | — | CWE-1104 Use of Unmaintained Third Party | CWE-937 |
The top three categories together carry the bulk of CWE-639 / CWE-862 / CWE-798 — the access-control and credential families. These are also the categories where AI generators have the most systematic blind spots: the bug is in what the model omitted, not what it produced.
Calibration — why the false-positive rate stays bounded
The reason you can read the table above as anything more than scanner noise is the calibration stack underneath it. Every probe in the 310-probe set is run against a clean reference site as well as the target.
| Reference | URL | Calibrates probes for |
|---|---|---|
| ref0 (general) | /site/ref0/ | The catch-all clean baseline; every probe runs here |
| ref-rls | /site/ref-rls/ | Supabase RLS / PostgREST detections |
| ref-jwt | /site/ref-jwt/ | JWT alg-confusion, kid-traversal, weak-secret detections |
| ref-oauth | /site/ref-oauth/ | OAuth redirect_uri, PKCE, state-parameter detections |
| ref-webhook | /site/ref-webhook/ | Stripe / payment webhook signature detections |
A probe that fires on its matched reference is, by construction, a false positive. The rule is killed; the count never reaches the report. Heuristic scanners that ship without ground-truth references publish recall-leaning numbers because they cannot measure their own precision. The benchmark below is net of false-positive elimination via the reference sites.
For the methodology in detail, see the companion pattern: False positives and the ref0 control.
What changed from the 2025 dataset
The November 2025 sample (n=412) is small enough that we are publishing the comparison with caveats: the platform mix has shifted, and the scanner has added 47 probes since then. Even with those caveats, the direction is clear.
| Category | 2025 share | 2026 share | Direction |
|---|---|---|---|
| Missing RLS | 64% | 59% | Improving slightly |
| Hardcoded secrets | 38% | 41% | Worse |
| BOLA | 27% | 32% | Worse |
| Self-editable roles | 14% | 20% | Worse |
| Outdated dependencies | 11% | 8% | Improving |
RLS awareness has grown — Lovable, Bolt, and Cursor now ship documentation that explicitly mentions Row Level Security. Secret handling has not. The number of apps shipping pk_live_ or service-role keys in their frontend bundle has gone up in absolute and relative terms.
Methodology
Sample. All apps were scanned by VibeEval between Nov 2025 and Apr 2026. Each was confirmed live at a public URL, identified by platform via DOM and bundle fingerprinting, and aggregated only with builder consent. Auth-walled apps where credentials could not be solved were excluded.
Scoring. CVSS 3.1 with a fixed rubric: critical 9.0+, high 7.0-8.9. Severity is set by the scanner based on the captured exploit, not on heuristics — every finding ships with a reproduced request and response.
Probes. 310 probes covering authentication, authorization, secret detection, transport security, input validation, and dependency CVEs. The full probe list is available on request and the categories are reproducible by anyone running the same scanner.
De-duplication. Findings are de-duplicated within an app at the route + category level. An app with three exposed Stripe keys in one bundle counts as one secret-exposure finding. Cross-platform de-duplication does not apply because each app is independent.
Limits. This benchmark measures what an authenticated DAST agent can prove from outside. It does not measure source-code vulnerabilities that never reach a reachable surface, and it does not measure social-engineering or supply-chain risk. Static analysis would catch a different set of issues; we recommend pairing this kind of benchmark with one.
Scope disclosure. The corpus-wide aggregate counts in this study were assembled from a mix of customer engagements (anonymized) and longitudinal scans against our own gapbench scenarios. Where the table reports a per-platform rate, the underlying data is a combination of direct scans (apps in the corpus) and equivalent gapbench scenarios (deliberately vulnerable apps shaped to mirror real Lovable / Bolt / Cursor / Replit / V0 outputs). The reproducibility anchor — the part anyone can verify — is the gapbench scenario set. The customer-engagement portion is anonymized by design.
If you want to verify a category’s findings, the companion pattern walkthroughs name the specific gapbench scenario for each, and the curl commands above let you reproduce the detection in seconds.
Reproduce on the public benchmark
Each of the top categories maps to a live scenario on gapbench.vibe-eval.com. The detection that produced the count in the table above is the same detection that fires against these scenarios.
| Category | gapbench scenario | Pattern walkthrough |
|---|---|---|
| Missing or broken RLS | supabase-clone | Supabase service-role leak |
| Hardcoded secrets in bundle | indie-saas, config-leak | Source maps and .git exposed |
| BOLA in CRUD | multi-tenant-saas, fintech-app | BOLA in AI-generated CRUD |
| CORS allow-all + credentials | cors-misconfig | CORS = * with credentials = true |
| Self-editable role / mass assignment | mass-assignment | Mass assignment |
| SSRF in upload / import | ssrf-image-proxy | SSRF, open redirects, OAuth redirect_uri |
| Open redirects in auth callbacks | oauth-redirect | SSRF, open redirects, OAuth redirect_uri |
| Naked databases (Postgres / Redis / Mongo) | naked-databases | Naked databases on the public internet |
| ref0 (clean control) | ref0 | False positives and the ref0 control |
For the manifesto-level argument behind the calibration approach — and why this is the only way to read corpus-wide numbers honestly — see Why we built gapbench.
How to reproduce a single data point
- Pick a live URL built on one of the five platforms.
- Run the free token leak checker — that gives you the secrets-in-bundle data point.
- Run the Supabase RLS checker — that gives you the RLS data point.
- Run the Vibe Code Scanner on the URL — that gives you the BOLA, CORS, and rate-limit data points.
The four scanners together cover the top five categories in this benchmark. The full VibeEval agent covers all 310 probes.
Citations
If you reference this study, please cite as:
VibeEval. How Secure Is an AI-Generated App? 2026 Benchmark of Lovable, Bolt, Cursor, Replit, and V0. May 2026. https://vibe-eval.com/data-studies/ai-app-security-benchmark-2026/
We update the dataset quarterly. The current snapshot is dated in the page metadata. Older snapshots are archived under /data-studies/archive/ once superseded.
Related
- Pattern manifesto: Why we built gapbench, and why every heuristic scanner needs a ref0
- Pattern walkthrough: BOLA in AI-generated CRUD — the missing ownership check
- Pattern walkthrough: The Supabase service-role key in your frontend bundle
- Pattern walkthrough: False positives and the ref0 control
- Hub: All patterns we keep finding — anatomy + reproducible demo + detection method
- Data study: Supabase RLS in the Wild — 2026 Misconfiguration Atlas
- Data study: Where Vibe Coders Leak Their Keys — 2026 Frontend Secrets Report
- Data study: Lovable vs Bolt vs Cursor — Same Spec, Three Apps, Three Profiles
- Guide: Is My Lovable App Secure? Builder Checklist
- Guide: Solo Founder Pre-Launch Security Checklist
- Comparison: Best Security Scanner for AI-Generated Apps
- Platform safety reviews
RUN IT YOURSELF
Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.
curl -s 'https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/users?select=*' -H 'apikey: ANON_KEY'
curl -s https://gapbench.vibe-eval.com/site/indie-saas/ | grep -oE 'sk_(live|test)_[A-Za-z0-9]{20,}'
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
curl -s -X PATCH https://gapbench.vibe-eval.com/site/mass-assignment/api/profile -H 'Authorization: Bearer USER_TOKEN' -d '{"is_admin":true}'
curl -s -I https://gapbench.vibe-eval.com/site/ref0/
COMMON QUESTIONS
BENCHMARK YOUR OWN APP
Run the same scan against your URL. Report in under 60 seconds.