HOW SECURE IS AN AI-GENERATED APP? 2026 FAILURE-MODE CATALOG
AI-built apps from Lovable, Bolt, Cursor, Replit, and V0 ship with a recurring set of authorization and credential failures. This catalog ranks the modes by how often they reproduce on the gapbench public benchmark and in anonymized customer engagements, with the OWASP and CWE mapping for each.
This is a failure-mode catalog: the recurring vulnerability classes we see in live applications built on Lovable, Bolt.new, Cursor, Replit, and V0, scored on the same CVSS rubric and mapped to the same taxonomy. The relative rankings below reflect what reproduces most reliably on the gapbench public benchmark and what we encounter most often in anonymized customer engagements.
If you are a builder, a journalist, or an AppSec researcher, every row in the tables below is reproducible against a live gapbench scenario. The methodology section explains exactly how.
Catalog scope
| Field | Value |
|---|---|
| Window | Nov 2025 – Apr 2026 |
| Source | Customer engagements (anonymized) + gapbench reproducible scenarios |
| Severity rubric | CVSS 3.1 (critical 9.0+, high 7.0–8.9) |
| Calibration controls | ref0, ref-rls, ref-jwt, ref-oauth, ref-webhook |
| Public reproducibility anchor | gapbench.vibe-eval.com — 97 deliberately vulnerable + 7 clean controls |
We do not publish a single corpus-wide N because the underlying set mixes customer engagements (anonymized) and longitudinal scans against our own gapbench scenarios. The reproducibility anchor — the part anyone can verify — is the gapbench scenario set referenced under each finding.
Per-platform modal failure mode
The dominant failure mode we observe per platform, ranked by relative frequency of “at least one critical or high finding.”
| Platform | Relative critical+high incidence | Modal top finding |
|---|---|---|
| Lovable | Highest | Missing or broken Supabase RLS |
| Bolt.new | High | Hardcoded secrets in client bundle |
| Replit | High | Public .env exposure on default deployments |
| Cursor | Moderate | Broken object-level auth (BOLA) |
| V0 | Lower (application-layer) | Unauthenticated API routes generated alongside components |
Lovable’s elevated rate is structural, not incidental — see the FAQ. V0’s lower application-layer rate reflects that V0 apps typically outsource their backend; the underlying Supabase or Convex backend then carries the RLS and credential failures separately.
Top 10 recurring vulnerabilities
Ranked by how consistently we reproduce them across platforms and engagements. Every row maps to a gapbench scenario in the “Reproduce” section below.
| Rank | Category | OWASP mapping | Recurrence | gapbench scenario |
|---|---|---|---|---|
| 1 | Missing or broken Row Level Security | API1 BOLA · API3 BOPLA | Most-reproduced | supabase-clone |
| 2 | Hardcoded secrets in frontend bundle | A02 Cryptographic Failures | Highly recurring | indie-saas, config-leak |
| 3 | Broken object-level authorization (BOLA) | API1 BOLA | Highly recurring | multi-tenant-saas |
| 4 | Missing rate limiting on auth and write endpoints | API4 Unrestricted Resource Consumption | Common | — |
| 5 | CORS allow-all on credentialed endpoints | A05 Security Misconfiguration | Common | cors-credentials-misconfig |
| 6 | Self-editable role or permission fields | API5 BFLA · API6 Mass Assignment | Common | mass-assignment |
| 7 | SSRF via user-supplied URLs in upload/import flows | A10 SSRF | Recurring | ssrf-open-redirect-oauth |
| 8 | Verbose error responses leaking stack traces | A09 Logging Failures | Recurring | — |
| 9 | Open redirects in auth callback handlers | A01 Broken Access Control | Recurring | ssrf-open-redirect-oauth |
| 10 | Outdated dependencies with known critical CVEs | A06 Vulnerable Components | Recurring | — |
The top three are the categories that recur on essentially every engagement. Any single one of them is sufficient to leak every user’s data.
CWE / OWASP mapping for the top 10
The OWASP column in the table above is one mapping per row; in practice each finding usually carries two or three CWE codes. The expanded mapping below is the canonical one we tag findings against.
| Rank | Category | OWASP API | OWASP Web | OWASP LLM | Primary CWE | Secondary CWE |
|---|---|---|---|---|---|---|
| 1 | Missing or broken RLS | API1 BOLA · API3 BOPLA | A01 Broken Access Control | — | CWE-862 Missing Authorization | CWE-863 Incorrect Authorization |
| 2 | Hardcoded secrets in frontend bundle | API8 Security Misconfiguration | A02 Cryptographic Failures · A05 | LLM07 System Prompt Leakage | CWE-798 Hard-coded Credentials | CWE-200 Sensitive Info Exposure |
| 3 | BOLA | API1 BOLA | A01 Broken Access Control | — | CWE-639 Auth Bypass via Key | CWE-284 Improper Access Control |
| 4 | Missing rate limiting | API4 Unrestricted Resource Consumption | A05 Security Misconfiguration | LLM10 Unbounded Consumption | CWE-770 Allocation w/o Limits | CWE-307 Improper Restriction of Auth Attempts |
| 5 | CORS allow-all on credentialed endpoints | API8 Security Misconfiguration | A05 Security Misconfiguration | — | CWE-942 Permissive Cross-domain Policy | CWE-346 Origin Validation Error |
| 6 | Self-editable role / mass assignment | API5 BFLA · API6 Mass Assignment | A04 Insecure Design | — | CWE-915 Mass Assignment | CWE-863 Incorrect Authorization |
| 7 | SSRF in upload / import flows | API7 Server Side Request Forgery | A10 SSRF | — | CWE-918 SSRF | CWE-441 Confused Deputy |
| 8 | Verbose error responses | API8 Security Misconfiguration | A09 Logging Failures · A05 | — | CWE-209 Info Exposure via Error | CWE-200 |
| 9 | Open redirects in auth callbacks | API8 Security Misconfiguration | A01 Broken Access Control | — | CWE-601 URL Redirect to Untrusted | CWE-639 |
| 10 | Outdated dependencies with known CVEs | API8 Security Misconfiguration | A06 Vulnerable Components | — | CWE-1104 Use of Unmaintained Third Party | CWE-937 |
The top three categories together carry the bulk of CWE-639 / CWE-862 / CWE-798 — the access-control and credential families. These are also the categories where AI generators have the most systematic blind spots: the bug is in what the model omitted, not what it produced.
Calibration — why the catalog is not scanner noise
Every probe behind this catalog runs against a clean reference site as well as the target — the calibration stack is what separates a recurring failure mode from a noisy detection.
| Reference | URL | Calibrates probes for |
|---|---|---|
| ref0 (general) | /site/ref0/ | The catch-all clean baseline; every probe runs here |
| ref-rls | /site/ref-rls/ | Supabase RLS / PostgREST detections |
| ref-jwt | /site/ref-jwt/ | JWT alg-confusion, kid-traversal, weak-secret detections |
| ref-oauth | /site/ref-oauth/ | OAuth redirect_uri, PKCE, state-parameter detections |
| ref-webhook | /site/ref-webhook/ | Stripe / payment webhook signature detections |
A probe that fires on its matched reference is, by construction, a false positive. The rule is killed before it ships. Heuristic scanners that lack ground-truth references publish recall-leaning numbers because they cannot measure their own precision. Every recurrence claim in this catalog is net of false-positive elimination via the reference sites.
For the methodology in detail, see the companion pattern: False positives and the ref0 control.
What is moving year-over-year
We do not publish year-over-year share percentages because we cannot independently verify a uniform sample across two windows. What we can report directionally, from what we see in engagements:
- RLS awareness has grown. Lovable, Bolt, and Cursor now ship documentation that explicitly mentions Row Level Security. The failure mode persists, but the framing in the platforms’ own docs has changed.
- Secret handling has not improved. Service-role keys and
sk_live_style secrets in frontend bundles are still the modal credential leak — see the Frontend Secrets Report. - BOLA in AI-generated CRUD is, if anything, more common as platforms expand their custom-API surface beyond pure PostgREST — see BOLA in AI-generated CRUD.
Methodology
Source. Failure modes were identified across (a) anonymized customer engagements with apps built on Lovable, Bolt.new, Cursor, Replit, and V0 between Nov 2025 and Apr 2026, and (b) deliberately vulnerable scenarios on gapbench.vibe-eval.com shaped to mirror real outputs from each platform. We do not publish a single corpus N or per-platform sample counts because the underlying engagements are anonymized by design and not a uniform random sample.
Scoring. CVSS 3.1 with a fixed rubric: critical 9.0+, high 7.0-8.9. Severity is set by the scanner based on the captured exploit — every finding ships with a reproduced request and response.
Probes. A probe set covering authentication, authorization, secret detection, transport security, input validation, and dependency CVEs. Every probe is reproducible by anyone running the same scanner against the matching gapbench scenario.
Calibration. Every probe runs against a matched clean reference (ref0, ref-rls, ref-jwt, ref-oauth, ref-webhook). A probe that fires on its reference is by construction a false positive and is killed before it ships.
Limits. This catalog covers what an authenticated DAST agent can prove from outside. It does not measure source-code vulnerabilities that never reach a reachable surface, social-engineering risk, or supply-chain compromise. Static analysis catches a different set of issues; pair this catalog with one.
Scope disclosure. Relative-frequency claims (“most-reproduced”, “highly recurring”, per-platform modal failure) are grounded in (1) the gapbench scenario set, which is fully public and curl-reproducible, and (2) anonymized customer engagements. Where you see a relative ranking but no absolute percentage, that is deliberate: the percentage would imply a uniform sample we are not in a position to publish.
Reproduce on the public benchmark
Each of the top categories maps to a live scenario on gapbench.vibe-eval.com. The detection that produced the count in the table above is the same detection that fires against these scenarios.
| Category | gapbench scenario | Pattern walkthrough |
|---|---|---|
| Missing or broken RLS | supabase-clone | Supabase service-role leak |
| Hardcoded secrets in bundle | indie-saas, config-leak | Source maps and .git exposed |
| BOLA in CRUD | multi-tenant-saas, fintech-app | BOLA in AI-generated CRUD |
| CORS allow-all + credentials | cors-misconfig | CORS = * with credentials = true |
| Self-editable role / mass assignment | mass-assignment | Mass assignment |
| SSRF in upload / import | ssrf-image-proxy | SSRF, open redirects, OAuth redirect_uri |
| Open redirects in auth callbacks | oauth-redirect | SSRF, open redirects, OAuth redirect_uri |
| Naked databases (Postgres / Redis / Mongo) | naked-databases | Naked databases on the public internet |
| ref0 (clean control) | ref0 | False positives and the ref0 control |
For the manifesto-level argument behind the calibration approach — and why this is the only way to read corpus-wide numbers honestly — see Why we built gapbench.
How to reproduce a single data point
- Pick a live URL built on one of the five platforms.
- Run the free token leak checker — that gives you the secrets-in-bundle data point.
- Run the Supabase RLS checker — that gives you the RLS data point.
- Run the Vibe Code Scanner on the URL — that gives you the BOLA, CORS, and rate-limit data points.
The four scanners together cover the top five categories in this benchmark. The full VibeEval agent covers all 310 probes.
Sources and references
- gapbench public benchmark. gapbench.vibe-eval.com — 97 deliberately vulnerable scenarios + 7 clean controls. Every failure mode in this catalog reproduces against one of the listed scenarios via curl.
- OWASP API Security Top 10 (2023). owasp.org/API-Security — the API1 BOLA, API3 BOPLA, API5 BFLA, API8 Security Misconfiguration mappings.
- OWASP Top 10 Web (2021). owasp.org/Top10 — A01–A10 mappings.
- OWASP LLM Top 10 (2025). LLM07 System Prompt Leakage, LLM10 Unbounded Consumption mappings for AI-feature-specific findings.
- CVSS 3.1. first.org/cvss/v3.1 — severity rubric.
- CWE. cwe.mitre.org — every finding carries its primary CWE.
Citations
If you reference this catalog, please cite as:
VibeEval. How Secure Is an AI-Generated App? 2026 Failure-Mode Catalog for Lovable, Bolt, Cursor, Replit, and V0. May 2026. https://vibe-eval.com/data-studies/ai-app-security-benchmark-2026/
We refresh the catalog when new failure modes are confirmed on gapbench. The current revision is dated in the page metadata.
Related
- Pattern manifesto: Why we built gapbench, and why every heuristic scanner needs a ref0
- Pattern walkthrough: BOLA in AI-generated CRUD — the missing ownership check
- Pattern walkthrough: The Supabase service-role key in your frontend bundle
- Pattern walkthrough: False positives and the ref0 control
- Hub: All patterns we keep finding — anatomy + reproducible demo + detection method
- Data study: Supabase RLS in the Wild — 2026 Misconfiguration Atlas
- Data study: Where Vibe Coders Leak Their Keys — 2026 Frontend Secrets Report
- Data study: Lovable vs Bolt vs Cursor — Same Spec, Three Apps, Three Profiles
- Guide: Is My Lovable App Secure? Builder Checklist
- Guide: Solo Founder Pre-Launch Security Checklist
- Comparison: Best Security Scanner for AI-Generated Apps
- Platform safety reviews
RUN IT YOURSELF
Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.
curl -s 'https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/users?select=*' -H 'apikey: ANON_KEY'
curl -s https://gapbench.vibe-eval.com/site/indie-saas/ | grep -oE 'sk_(live|test)_[A-Za-z0-9]{20,}'
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
curl -s -X PATCH https://gapbench.vibe-eval.com/site/mass-assignment/api/profile -H 'Authorization: Bearer USER_TOKEN' -d '{"is_admin":true}'
curl -s -I https://gapbench.vibe-eval.com/site/ref0/
COMMON QUESTIONS
BENCHMARK YOUR OWN APP
Run the same scan against your URL. Report in under 60 seconds.