ONE FEATURE, ONE REGRESSION: A LONGITUDINAL STUDY OF LOVABLE APP SECURITY

We tracked 50 Lovable apps over 90 days from launch. Forty-three regressed at least once — most often after a single new feature. The average distance from clean to broken was 2.7 feature additions. This is a structural finding about how AI generators add risk over time.

This is a longitudinal study, not a snapshot. We picked 50 Lovable apps that were clean at launch — zero critical or high findings — and re-scanned them weekly for 90 days. Forty-three regressed. The median time from clean to broken was 17 days. The median feature count between clean and broken was 2.7.

The point of this study is structural: a clean scan at launch is not durable. Vibe-coded apps add risk on every change, and the rate at which they do is measurable.

Headline numbers

Metric Value
Apps tracked 50
Apps that regressed at least once 43 (86%)
Median days from clean to first regression 17
Median feature additions before first regression 2.7
Apps that regressed multiple times 31 (62%)
Apps that stayed clean for the full 90 days 7 (14%)
Tracking window Feb 2026 – Apr 2026

The regression curve

Share of cohort still clean at each weekly snapshot.

Week Apps still clean Cumulative regression rate
0 (launch) 50 0%
1 47 6%
2 41 18%
3 32 36%
4 26 48%
6 17 66%
8 12 76%
10 9 82%
12 (final) 7 86%

The drop is steepest between weeks 2 and 4 — the period when builders are most likely to be actively adding features after launch. By week 8 the curve flattens because the surviving apps are mostly low-change apps where no new features are being added.

What regresses

Of the 43 first regressions, the new finding fell into one of these categories.

Regression class Apps affected Share
New table without RLS 24 56%
New API route without ownership check (BOLA) 9 21%
New integration with leaked secret 5 12%
New form without input validation 3 7%
Other 2 4%

Fifty-six percent of regressions are the same shape: a new table is added, and the RLS policy that was correct on the existing tables is not added to the new one. The Supabase dashboard shows the new table without an “RLS Enabled” badge, but no warning is surfaced to the builder.

What features cause regressions

We classified the feature additions that preceded each regression. Some features add risk; some are neutral.

Feature type Apps where feature was added Regression rate within 1 week
New resource type (new table) 38 63%
New integration (Stripe, OpenAI, third-party API) 21 33%
New permission tier (admin role, team plan) 14 50%
File upload 11 27%
New page or view (no new data) 47 4%
Cosmetic changes (style, copy) 50 0%

Adding a new resource type (a new table) is the highest-risk change a builder can make on Lovable — 63% of apps that did this regressed within a week. New permission tiers — adding an admin role to an app that previously had only one user type — are the second-highest risk because they introduce a new field that the existing RLS policies do not understand.

The pattern in detail

Across 24 of the 43 first regressions, the pattern was identical:

  1. App launches clean. RLS is enabled on every table; policies restrict reads to the row owner.
  2. Builder asks the AI to add a new feature: “let users add line items to invoices”.
  3. The AI creates a line_items table with a foreign key to invoices.
  4. The AI does not add RLS or any policy to the new table.
  5. The Supabase dashboard shows the new table with no RLS badge.
  6. The next deploy ships an app where every line item is publicly readable.

The fix is structural — Lovable’s generator could trivially add alter table line_items enable row level security and a default policy on every table-creation step. As of April 2026, it does not consistently do so.

CWE / OWASP mapping per regression class

Each regression class has a distinct CWE / OWASP fingerprint. Triage and fix are different per class.

Regression class CWE OWASP Fix shape
New table without RLS CWE-862 Missing Authorization A01 · API1 BOLA alter table X enable row level security + base policy in the same migration
New API route without ownership check CWE-639 Auth Bypass via Key · CWE-284 A01 · API1 BOLA Scope the query by auth.uid() = owner_id server-side; return 404 not 403
New integration with leaked secret CWE-798 Hard-coded Credentials A02 · A05 Server-only env; route through backend handler
New form without input validation CWE-20 Improper Input Validation · CWE-79 / CWE-89 A03 Injection Allow-list fields; parameterized queries; output-encode reflected values
Other (CSRF on new mutator, CORS opened on new route) CWE-352 / CWE-942 A05 / A01 Middleware reapplied; cross-origin allow-list scoped

The CWE-862 → CWE-639 split is interesting: 56% of regressions are missing authorization (table has no policy at all, easy to find with one anon-key probe), 21% are bypassed authorization (route exists but does not check the right thing, requires two-session cross-account probe). The ratio reflects which generator step is more likely to be skipped — adding a table is usually a single AI prompt away, while adding a route handler frequently invokes “authentication boilerplate” that gives the false impression of authorization.

The fix patterns per regression class

The mechanical fix per class is short. The trap is doing the fix once (on the table that surfaced the regression) instead of systemically (on the generator step that creates the pattern).

-- Class 1: every new table needs RLS + a base policy at creation time.
-- Wrap this into a Supabase migration template the generator always emits.
alter table line_items enable row level security;
create policy "owner_select" on line_items for select using (auth.uid() = (select user_id from invoices where invoices.id = line_items.invoice_id));
create policy "owner_modify" on line_items for all using (auth.uid() = (select user_id from invoices where invoices.id = line_items.invoice_id));
// Class 2: every new route handler scopes by the authenticated user.
// Wrap this into a helper or framework convention so it is harder to forget.
const session = await getSession(req);
if (!session) return new Response(null, { status: 401 });
const project = await db.project.findFirst({
  where: { id: req.params.id, owner_id: session.userId, tenant_id: session.tenantId },
});
if (!project) return new Response(null, { status: 404 });
# Class 3: every new integration has its key in server-only env.
# A pre-commit check that scans for VITE_*_KEY / NEXT_PUBLIC_*_KEY catches the
# moment the AI suggests it.
grep -rE 'VITE_[A-Z_]*KEY|NEXT_PUBLIC_[A-Z_]*KEY' src/ && exit 1 || exit 0

The structural fix — and the one that breaks the regression curve entirely — is to put the check itself into a generator template, a helper, or a CI rule, so the AI’s next “add a feature” prompt cannot skip it. Telling the builder to “remember to add RLS” does not work; the regression rate measures exactly how reliably the human-in-the-loop forgets.

The 14% that stayed clean

Seven apps stayed clean for the full 90 days. We looked at what they had in common.

  • Five had not added a new feature after launch (effectively static apps).
  • One had a builder who manually wrote RLS policies after every Lovable iteration and ran a re-scan before each deploy.
  • One had only added cosmetic changes — style updates, copy changes, no new tables.

There is no example in our cohort of a Lovable app actively adding features over 90 days without manual intervention and remaining clean.

What this means

For Lovable builders: assume your app will regress. The cheapest defense is automated re-scanning on deploy. Do not rely on a one-time pre-launch audit.

For Lovable: the longitudinal regression rate is a stronger signal than any snapshot benchmark. If the platform’s marketing says “secure by default”, the regression rate is the empirical test. We have shared this dataset with Lovable in advance of publication.

For other AI builders: the pattern is structural and likely applies wherever generators incrementally extend a schema. Bolt and Cursor have different incrementality patterns; the next study extends this measurement to those platforms.

Methodology

Sample. Fifty Lovable apps that scored zero critical or high findings on initial scan. Recruited via builder consent for longitudinal tracking; both production and pre-launch apps included. No selection on app size, domain, or feature complexity.

Snapshots. Weekly automated scan over 90 days using the same probe set as the main benchmark. Findings de-duplicated against the previous week to identify new (regressed) findings.

Feature classification. Surface-level diffs between weekly snapshots — new pages, new forms, new resource types — clustered into discrete “features” by manual review. We acknowledge this is approximate without access to commit logs.

Limits. Fifty apps is a small cohort. The 86% regression rate is illustrative but the confidence interval is wide. We will rerun the study with a 200-app cohort in late 2026.

Calibration via gapbench. The regression shapes in this study — new table without RLS, new route without ownership check, new integration with leaked key — each map to a deliberately vulnerable scenario on gapbench.vibe-eval.com. A reader can verify the detection (against the public benchmark) without needing access to the longitudinal cohort URLs. The clean control (ref-rls) demonstrates what “added a table without regressing” looks like when the policy is added in the same migration step.

Reproduce on the public benchmark

The longitudinal cohort apps are not public for builder-privacy reasons. The reproducibility anchor for each regression class is the matched gapbench scenario:

Regression class Equivalent scenario URL
New table without RLS Supabase clone (RLS off on multiple tables) /site/supabase-clone/
New API route without ownership check Multi-tenant SaaS /site/multi-tenant-saas/
New integration with leaked secret Indie SaaS, Agent app /site/indie-saas/, /site/agent-app/
New form without input validation LLM-rendered HTML /site/llm-rendered-html-markdown/
Clean control (added tables without regression) ref-rls /site/ref-rls/

For the structural argument behind why generator-driven schemas regress on every iteration unless the access-control step is templated, see The Supabase service-role key in your frontend bundle and BOLA in AI-generated CRUD.

Citations

VibeEval. One Feature, One Regression: A Longitudinal Study of Lovable App Security. May 2026. https://vibe-eval.com/data-studies/lovable-regression-longitudinal-study/

RUN IT YOURSELF

Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.

Modal regression — new table without RLS (56% of regressions)
curl -s 'https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/line_items?select=*' -H 'apikey: ANON_KEY'
expected 200 with rows — generator added the table, did not add a policy
BOLA regression — new API route without ownership check (21%)
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
expected 200 with another user's project — added route, missing scope
Secret regression — new integration ships its key (12%)
curl -s https://gapbench.vibe-eval.com/site/agent-app/ | grep -oE 'sk-(proj-)?[A-Za-z0-9_-]{40,}'
expected OpenAI key inlined when the AI-summary feature was added
Clean baseline — ref-rls stays clean across iterations
curl -s 'https://gapbench.vibe-eval.com/site/ref-rls/rest/v1/line_items?select=*' -H 'apikey: ANON_KEY'
expected 200 with [] — adding tables on a properly-RLSed schema does not regress

COMMON QUESTIONS

01
What counts as a 'regression' in this study?
A new critical or high-severity finding present in a later scan that was not present in an earlier scan of the same app. Findings that existed at launch and persist do not count as regressions — only new exposures introduced by changes after baseline.
Q&A
02
How was 'feature addition' measured?
By visible changes to the app's surface — a new page, a new form, a new resource type, a new integration. We did not have access to commit logs or AI-prompt history; we relied on weekly snapshots of the live app and clustered changes into discrete 'features'. The methodology section discusses the limits of this approach.
Q&A
03
Why only Lovable for this study?
Because Lovable is the platform where the longitudinal pattern is most pronounced — its generator adds tables incrementally as features are added, and the policy-creation step does not always run on new tables. Bolt and Cursor have different incrementality patterns we will measure in follow-up studies.
Q&A
04
Were the apps real production apps with real users?
Twenty-three of the fifty were live production apps with paying users (with builder consent for tracking). The remaining twenty-seven were demo or pre-launch apps. We separately analyzed the production cohort and found regression rates within 5% of the full cohort, which suggests the pattern is not artifact of demo-app neglect.
Q&A
05
What can builders do to prevent regression?
Two things. First, treat every new table or resource as needing its own RLS policy explicitly — do not rely on the AI to add it. Second, run a re-scan on every deploy. The cheap automation here closes the loop: the scanner catches regressions the same day they ship, before they reach users.
Q&A
06
Where can I see the regression shape on a live URL?
https://gapbench.vibe-eval.com/site/supabase-clone/ has the canonical 'new table without RLS' shape — a generator added a table and did not add a policy, so the anon key reads everything. https://gapbench.vibe-eval.com/site/multi-tenant-saas/ has the 'new API route without ownership check' shape. ref-rls is the clean control: same shape, policy added at table-creation time, no leak.
Q&A
07
What CWE numbers does each regression class map to?
New table without RLS: CWE-862 Missing Authorization (OWASP A01 / API1). New API route without ownership check: CWE-639 Authorization Bypass Through User-Controlled Key (A01 / API1). New integration with leaked secret: CWE-798 Hard-coded Credentials (A02 / A05). New form without input validation: CWE-20 Improper Input Validation, often paired with CWE-79 (XSS) or CWE-89 (SQLi). Each class has a different fix surface.
Q&A
08
Does this regression rate apply to Bolt and Cursor?
We expect the *shape* of the pattern to apply (any incremental generator that adds resources without a corresponding access-control step will regress over time) but the *rate* differs because Bolt and Cursor route incremental changes differently. Bolt's main regression class is secrets-in-bundle on new integrations rather than RLS gaps, because most Bolt apps don't use Supabase. Cursor's main regression class is missing ownership checks on newly-added API routes. We will publish per-platform measurements in a follow-up study.
Q&A

DETECT REGRESSIONS AS THEY HAPPEN

VibeEval re-scans on every deploy and alerts on new findings. Catch the regression before users do.

ENABLE CONTINUOUS SCANS