LOVABLE VS BOLT VS CURSOR: SAME SPEC, THREE APPS, THREE PROFILES

We gave Lovable, Bolt.new, and Cursor the exact same one-paragraph spec — a freelancer invoice-tracking SaaS. Each built a working app. We scanned all three. The findings are different, the totals are different, and the failure modes are not what platform marketing pages would have you expect.

This is a controlled experiment. One spec, three platforms, three deployed apps, one scan. The numbers below are not aggregates from a corpus — they are findings from three specific apps, all generated and deployed in March 2026 from the same starting prompt.

The point is to show that “is platform X secure” is a question with a structurally different answer for each platform, even when the spec is identical.

The spec

Verbatim, given as the first message to each platform’s generator:

Build a SaaS web app for freelancers to send invoices. Users sign up with email + password. Each user can create invoices with line items, tax, and a due date. Invoices can be marked sent, paid, or overdue. Users see a dashboard with the total outstanding amount and a list of recent invoices. Stripe integration to collect payment when the client clicks the invoice link. Use a backend so I can add features later.

We accepted the first deployable build on each platform. No follow-up prompts, no refactoring, no manual edits.

Results — total findings

Severity Lovable Bolt.new Cursor
Critical 4 3 1
High 6 9 4
Medium 11 7 9
Low 8 5 12
Total 29 24 26

Total counts are within noise of each other. The story is in the distribution.

Critical findings, side by side

# Finding Lovable Bolt.new Cursor
1 Missing RLS on invoices table yes n/a (no Supabase) n/a (no Supabase)
2 Missing RLS on users table yes n/a n/a
3 Stripe sk_live_ in frontend bundle no yes no
4 Supabase service-role JWT in bundle yes n/a n/a
5 OpenAI key in bundle (used for invoice description AI) no yes no
6 BOLA on GET /api/invoices/:id n/a (PostgREST) yes yes
7 BOLA on PATCH /api/invoices/:id yes (RLS gap) yes no

Three completely different failure profiles emerge from the same spec.

Failure profile by platform

Lovable

Concentration: database authorization. All four critical findings are RLS gaps. The Stripe integration is correct; Lovable consistently routes Stripe through Supabase Edge Functions and does not ship the secret key. The auth flow is correct; users are authenticated via Supabase Auth.

What fails is RLS on the tables Lovable’s generator created — invoices, users, line_items, and payments. None had policies. The dashboard shows them as RLS-disabled.

Modal failure mode. Permissive policy or policy missing entirely.

Single-line fix per table. Add using (auth.uid() = user_id) to a select policy on each.

Bolt.new

Concentration: secret handling. Bolt produced a frontend-only app with serverless functions — no Supabase, just Postgres via a connection string. The connection string ended up in the frontend bundle. The Stripe sk_live_ key, also in the bundle. The OpenAI key Bolt added for an “AI-generated invoice description” feature, also in the bundle.

The authorization on the API endpoints is mostly correct, but it is the only thing standing between the public bundle and full database access — and a connection string in the bundle bypasses the API entirely.

Modal failure mode. Secrets shipped to the browser via VITE_* environment variables.

Single-line fix per secret. Move the secret to a server-only variable, route the integration through a backend handler.

Cursor

Concentration: custom API authorization. Cursor produced the most architecturally clean app — Next.js with API routes, JWT-based auth, environment-variable secrets. The secrets are correctly server-only. The auth flow is correct.

What fails is the authorization logic in the API routes. Most route handlers fetch the resource by ID and return it without checking ownership. Two routes accept arbitrary fields in the update body, including fields that should be immutable.

Modal failure mode. Missing ownership check in route handlers.

Single-line fix per route. Add if (resource.user_id !== session.user.id) return new Response(null, { status: 404 }).

What this means for the “is X safe” question

The three apps produced from the same spec have:

  • Comparable total finding counts (24, 26, 29)
  • Three completely different concentrations of critical findings
  • Three completely different audit checklists for the builder

A founder reading “is Lovable safe” and getting a generic answer will not realize that the Lovable-specific audit is different from the Bolt-specific audit which is different from the Cursor-specific audit. Each platform’s safety posture is shaped by where it routes integrations and how it handles state — not by a generic “yes/no”.

The platform-specific safety reviews (Lovable, Bolt, Cursor) carry the per-platform fix lists.

CWE / OWASP profile per platform

The same total finding count, three completely different CWE distributions. Each platform’s “modal failure” maps to a different fix surface.

Platform Modal CWE family Modal OWASP Fix surface
Lovable CWE-862 Missing Authorization · CWE-863 Incorrect Authorization A01 Broken Access Control · API1 BOLA Supabase RLS policies on every table the generator added
Bolt CWE-798 Hard-coded Credentials · CWE-540 Sensitive Info in Source A02 Cryptographic Failures · A05 Security Misconfiguration Move every secret to server-only env; add a backend handler
Cursor CWE-639 Auth Bypass via Key · CWE-915 Mass Assignment A04 Insecure Design · API1 BOLA · API6 Mass Assignment Add ownership checks in API route handlers; allow-list update fields
All three (shared baseline) CWE-352 CSRF · CWE-770 No Rate Limit · CWE-693 Protection Mechanism Failure A05 Security Misconfiguration Platform-independent middleware: CSRF tokens, rate limit, security headers

The Lovable and Cursor profiles overlap on OWASP API1 (BOLA) but the layer differs — Lovable’s BOLA lives in the database via missing RLS, Cursor’s lives in the application via missing checks in the route handler. The fix per-finding is short in either case; the fix per-platform requires a checklist shaped to where the platform routes the data.

Pattern walkthroughs per failure profile

Each modal failure surfaced in this experiment has a companion pattern walkthrough that shows the bug on a live URL and walks the fix per stack:

What was the same

All three apps:

  • Allowed CSRF on state-changing endpoints (no platform added CSRF protection by default)
  • Shipped without rate limits on signup or login
  • Returned verbose error messages including stack traces in development mode (still on at deploy time)
  • Lacked HSTS headers
  • Lacked Content-Security-Policy headers

These five items are the platform-independent baseline failures of vibe-coded apps. They are not failures of one platform’s generator; they are failures of the entire category.

Methodology

Builds. Each app was created on a fresh trial account using the platform’s standard new-project flow. The full prompt above was the first and only message; we accepted the first deployable build. No iterations, no follow-ups, no manual edits. All three were built within a 48-hour window in March 2026.

Scan. Identical scan run against each deployed URL. The full VibeEval probe set (310 probes) ran with the same configuration on each.

Reproducibility. The prompt is reproducible. The artifacts are not — AI builders are non-deterministic. We expect any rerun to produce comparable but not identical findings; the failure-profile shape should hold across reruns even if individual counts shift.

Vendor outreach. Lovable, StackBlitz (Bolt.new), and Anysphere (Cursor) were notified 30 days before publication. Responses included where provided.

Calibration via gapbench equivalents. Each per-platform modal failure has a matched scenario on gapbench.vibe-eval.com that reproduces the same shape of bug independent of the specific apps in this experiment. This lets readers verify the detection (against the public benchmark) without relying on access to the specific deployed URLs of the experiment apps. Every detection that fired in this experiment also fires against its matched scenario, and is silent against ref0.

Reproduce on the public benchmark

The deployed experiment apps are not public — they were built on platform trial accounts, and re-publishing the URLs would surface user PII the platforms generated as test data. The reproducibility anchor for this study is the matched gapbench scenarios:

Profile in the experiment Equivalent gapbench scenario What reproduces
Lovable — RLS gaps on invoices/users/line_items/payments supabase-clone RLS off, permissive policy, partial coverage all on one app
Lovable — service-role JWT in bundle supabase-clone, config-leak Service-role JWT inlined
Bolt — Stripe / OpenAI / connection-string in bundle indie-saas Stripe sk_live_ + secrets stack
Cursor — BOLA on GET / PATCH multi-tenant-saas, fintech-app Cross-account read and write
Shared baseline — no rate limit, no CSRF, no HSTS auth-system Authentication surface with the platform-independent gaps
Clean reference for false-positive calibration ref0, ref-rls Same scan; no findings

Vendor responses

[Reserved for vendor responses received during the disclosure window. None received as of publication. We will append responses below as they arrive.]

Citations

VibeEval. Lovable vs Bolt vs Cursor: Same Spec, Three Apps, Three Security Profiles. May 2026. https://vibe-eval.com/data-studies/lovable-bolt-cursor-same-spec/

RUN IT YOURSELF

Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.

Lovable-shaped failure — RLS off on invoices
curl -s 'https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/invoices?select=*' -H 'apikey: ANON_KEY'
expected 200 with every invoice — modal Lovable failure: missing policy on a generator-added table
Bolt-shaped failure — Stripe sk_live_ in bundle
curl -s https://gapbench.vibe-eval.com/site/indie-saas/ | grep -oE 'sk_(live|test)_[A-Za-z0-9]{20,}'
expected Stripe secret key inlined via VITE_*; modal Bolt failure
Cursor-shaped failure — BOLA on PATCH /invoices/:id
curl -s -X PATCH https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN' -d '{"name":"hijacked"}'
expected 200 with mutation accepted; modal Cursor failure — clean architecture, missing ownership check
Shared baseline failure across all three — no rate limit on auth
for i in $(seq 1 200); do curl -s -X POST https://gapbench.vibe-eval.com/site/auth-system/api/login -d '{"email":"x@y.z","password":"wrong"}' & done; wait
expected 200 throughput — no rate limit; the platform-independent failure all three share
Clean control — ref0 produces nothing across the same scans
curl -s -I https://gapbench.vibe-eval.com/site/ref0/
expected All probes clean — confirms the platform-specific findings are not scanner noise

COMMON QUESTIONS

01
Is one platform more secure than another based on this study?
No platform produced a clean app from the prompt. Each has a distinct failure profile — Lovable concentrates risk in Supabase RLS, Bolt in frontend secret handling, Cursor in custom API authorization. The takeaway is not 'use platform X' but 'know what platform X tends to skip and audit accordingly'.
Q&A
02
Why these three platforms and not Replit or V0?
Replit and V0 produce fundamentally different artifact shapes — Replit is a hosted runtime with project-level configuration, V0 is component-only and requires a separate backend. Comparing apples to apples requires picking platforms that produce the same artifact: a deployable web app from a single prompt. Lovable, Bolt, and Cursor all do that. We will run the broader cross-platform comparison in a follow-up using a wider spec.
Q&A
03
Was the prompt the same for all three platforms?
Verbatim, copy-pasted. The full prompt is in the methodology section. We made no platform-specific adjustments and accepted the first deployable build each platform produced. We did not iterate, refactor, or fix anything before scanning.
Q&A
04
Will the result be the same if I run it tomorrow?
Probably not exactly the same. AI builders are non-deterministic and update their generators continuously. We will rerun the experiment quarterly and publish the diff. The methodology is fully reproducible — you can rerun it yourself with the same prompt.
Q&A
05
Did you tell the platforms in advance?
No. The apps were built using normal trial accounts. We notified each platform vendor of the findings before publication and gave them a 30-day window to comment. Their responses (where provided) are at the bottom of this page.
Q&A
06
Where can I see equivalent failure shapes on a public benchmark?
Each per-platform modal failure has a matched gapbench scenario. Lovable-shape: https://gapbench.vibe-eval.com/site/supabase-clone/. Bolt-shape: https://gapbench.vibe-eval.com/site/indie-saas/. Cursor-shape: https://gapbench.vibe-eval.com/site/multi-tenant-saas/. Each is curl-reproducible and the detection that fires there is the same detection that fired in this experiment.
Q&A
07
How can the totals be similar but the profiles different?
Because total finding count is a poor proxy for security posture. Each platform routes integrations differently — Lovable through Supabase Edge Functions, Bolt through frontend-only patterns, Cursor through Next.js API routes — and each routing concentrates risk in a different layer. The total can be flat across three platforms while the *category* of vulnerability and the *fix surface* are completely disjoint.
Q&A
08
What CWE / OWASP categories did the experiment surface?
Lovable findings clustered in CWE-862 / CWE-863 (RLS off, permissive policy) — OWASP A01 / API1. Bolt findings clustered in CWE-798 / CWE-540 (hard-coded credentials, sensitive info in source) — OWASP A02 / A05. Cursor findings clustered in CWE-639 / CWE-915 (auth bypass via key, mass assignment) — OWASP A04 / API1 / API6. The shared baseline failures (no CSRF, no rate limit, no HSTS) map to CWE-352, CWE-770, A05.
Q&A

RUN THE SAME SCAN ON YOUR APP

VibeEval against your URL produces a comparable per-finding report in under 60 seconds.

RUN COMPARABLE SCAN