ONE FEATURE, ONE REGRESSION: A LONGITUDINAL STUDY OF LOVABLE APP SECURITY
We tracked 50 Lovable apps over 90 days from launch. Forty-three regressed at least once — most often after a single new feature. The average distance from clean to broken was 2.7 feature additions. This is a structural finding about how AI generators add risk over time.
This is a longitudinal study, not a snapshot. We picked 50 Lovable apps that were clean at launch — zero critical or high findings — and re-scanned them weekly for 90 days. Forty-three regressed. The median time from clean to broken was 17 days. The median feature count between clean and broken was 2.7.
The point of this study is structural: a clean scan at launch is not durable. Vibe-coded apps add risk on every change, and the rate at which they do is measurable.
Headline numbers
| Metric | Value |
|---|---|
| Apps tracked | 50 |
| Apps that regressed at least once | 43 (86%) |
| Median days from clean to first regression | 17 |
| Median feature additions before first regression | 2.7 |
| Apps that regressed multiple times | 31 (62%) |
| Apps that stayed clean for the full 90 days | 7 (14%) |
| Tracking window | Feb 2026 – Apr 2026 |
The regression curve
Share of cohort still clean at each weekly snapshot.
| Week | Apps still clean | Cumulative regression rate |
|---|---|---|
| 0 (launch) | 50 | 0% |
| 1 | 47 | 6% |
| 2 | 41 | 18% |
| 3 | 32 | 36% |
| 4 | 26 | 48% |
| 6 | 17 | 66% |
| 8 | 12 | 76% |
| 10 | 9 | 82% |
| 12 (final) | 7 | 86% |
The drop is steepest between weeks 2 and 4 — the period when builders are most likely to be actively adding features after launch. By week 8 the curve flattens because the surviving apps are mostly low-change apps where no new features are being added.
What regresses
Of the 43 first regressions, the new finding fell into one of these categories.
| Regression class | Apps affected | Share |
|---|---|---|
| New table without RLS | 24 | 56% |
| New API route without ownership check (BOLA) | 9 | 21% |
| New integration with leaked secret | 5 | 12% |
| New form without input validation | 3 | 7% |
| Other | 2 | 4% |
Fifty-six percent of regressions are the same shape: a new table is added, and the RLS policy that was correct on the existing tables is not added to the new one. The Supabase dashboard shows the new table without an “RLS Enabled” badge, but no warning is surfaced to the builder.
What features cause regressions
We classified the feature additions that preceded each regression. Some features add risk; some are neutral.
| Feature type | Apps where feature was added | Regression rate within 1 week |
|---|---|---|
| New resource type (new table) | 38 | 63% |
| New integration (Stripe, OpenAI, third-party API) | 21 | 33% |
| New permission tier (admin role, team plan) | 14 | 50% |
| File upload | 11 | 27% |
| New page or view (no new data) | 47 | 4% |
| Cosmetic changes (style, copy) | 50 | 0% |
Adding a new resource type (a new table) is the highest-risk change a builder can make on Lovable — 63% of apps that did this regressed within a week. New permission tiers — adding an admin role to an app that previously had only one user type — are the second-highest risk because they introduce a new field that the existing RLS policies do not understand.
The pattern in detail
Across 24 of the 43 first regressions, the pattern was identical:
- App launches clean. RLS is enabled on every table; policies restrict reads to the row owner.
- Builder asks the AI to add a new feature: “let users add line items to invoices”.
- The AI creates a
line_itemstable with a foreign key toinvoices. - The AI does not add RLS or any policy to the new table.
- The Supabase dashboard shows the new table with no RLS badge.
- The next deploy ships an app where every line item is publicly readable.
The fix is structural — Lovable’s generator could trivially add alter table line_items enable row level security and a default policy on every table-creation step. As of April 2026, it does not consistently do so.
CWE / OWASP mapping per regression class
Each regression class has a distinct CWE / OWASP fingerprint. Triage and fix are different per class.
| Regression class | CWE | OWASP | Fix shape |
|---|---|---|---|
| New table without RLS | CWE-862 Missing Authorization | A01 · API1 BOLA | alter table X enable row level security + base policy in the same migration |
| New API route without ownership check | CWE-639 Auth Bypass via Key · CWE-284 | A01 · API1 BOLA | Scope the query by auth.uid() = owner_id server-side; return 404 not 403 |
| New integration with leaked secret | CWE-798 Hard-coded Credentials | A02 · A05 | Server-only env; route through backend handler |
| New form without input validation | CWE-20 Improper Input Validation · CWE-79 / CWE-89 | A03 Injection | Allow-list fields; parameterized queries; output-encode reflected values |
| Other (CSRF on new mutator, CORS opened on new route) | CWE-352 / CWE-942 | A05 / A01 | Middleware reapplied; cross-origin allow-list scoped |
The CWE-862 → CWE-639 split is interesting: 56% of regressions are missing authorization (table has no policy at all, easy to find with one anon-key probe), 21% are bypassed authorization (route exists but does not check the right thing, requires two-session cross-account probe). The ratio reflects which generator step is more likely to be skipped — adding a table is usually a single AI prompt away, while adding a route handler frequently invokes “authentication boilerplate” that gives the false impression of authorization.
The fix patterns per regression class
The mechanical fix per class is short. The trap is doing the fix once (on the table that surfaced the regression) instead of systemically (on the generator step that creates the pattern).
-- Class 1: every new table needs RLS + a base policy at creation time.
-- Wrap this into a Supabase migration template the generator always emits.
alter table line_items enable row level security;
create policy "owner_select" on line_items for select using (auth.uid() = (select user_id from invoices where invoices.id = line_items.invoice_id));
create policy "owner_modify" on line_items for all using (auth.uid() = (select user_id from invoices where invoices.id = line_items.invoice_id));
// Class 2: every new route handler scopes by the authenticated user.
// Wrap this into a helper or framework convention so it is harder to forget.
const session = await getSession(req);
if (!session) return new Response(null, { status: 401 });
const project = await db.project.findFirst({
where: { id: req.params.id, owner_id: session.userId, tenant_id: session.tenantId },
});
if (!project) return new Response(null, { status: 404 });
# Class 3: every new integration has its key in server-only env.
# A pre-commit check that scans for VITE_*_KEY / NEXT_PUBLIC_*_KEY catches the
# moment the AI suggests it.
grep -rE 'VITE_[A-Z_]*KEY|NEXT_PUBLIC_[A-Z_]*KEY' src/ && exit 1 || exit 0
The structural fix — and the one that breaks the regression curve entirely — is to put the check itself into a generator template, a helper, or a CI rule, so the AI’s next “add a feature” prompt cannot skip it. Telling the builder to “remember to add RLS” does not work; the regression rate measures exactly how reliably the human-in-the-loop forgets.
The 14% that stayed clean
Seven apps stayed clean for the full 90 days. We looked at what they had in common.
- Five had not added a new feature after launch (effectively static apps).
- One had a builder who manually wrote RLS policies after every Lovable iteration and ran a re-scan before each deploy.
- One had only added cosmetic changes — style updates, copy changes, no new tables.
There is no example in our cohort of a Lovable app actively adding features over 90 days without manual intervention and remaining clean.
What this means
For Lovable builders: assume your app will regress. The cheapest defense is automated re-scanning on deploy. Do not rely on a one-time pre-launch audit.
For Lovable: the longitudinal regression rate is a stronger signal than any snapshot benchmark. If the platform’s marketing says “secure by default”, the regression rate is the empirical test. We have shared this dataset with Lovable in advance of publication.
For other AI builders: the pattern is structural and likely applies wherever generators incrementally extend a schema. Bolt and Cursor have different incrementality patterns; the next study extends this measurement to those platforms.
Methodology
Sample. Fifty Lovable apps that scored zero critical or high findings on initial scan. Recruited via builder consent for longitudinal tracking; both production and pre-launch apps included. No selection on app size, domain, or feature complexity.
Snapshots. Weekly automated scan over 90 days using the same probe set as the main benchmark. Findings de-duplicated against the previous week to identify new (regressed) findings.
Feature classification. Surface-level diffs between weekly snapshots — new pages, new forms, new resource types — clustered into discrete “features” by manual review. We acknowledge this is approximate without access to commit logs.
Limits. Fifty apps is a small cohort. The 86% regression rate is illustrative but the confidence interval is wide. We will rerun the study with a 200-app cohort in late 2026.
Calibration via gapbench. The regression shapes in this study — new table without RLS, new route without ownership check, new integration with leaked key — each map to a deliberately vulnerable scenario on gapbench.vibe-eval.com. A reader can verify the detection (against the public benchmark) without needing access to the longitudinal cohort URLs. The clean control (ref-rls) demonstrates what “added a table without regressing” looks like when the policy is added in the same migration step.
Reproduce on the public benchmark
The longitudinal cohort apps are not public for builder-privacy reasons. The reproducibility anchor for each regression class is the matched gapbench scenario:
| Regression class | Equivalent scenario | URL |
|---|---|---|
| New table without RLS | Supabase clone (RLS off on multiple tables) | /site/supabase-clone/ |
| New API route without ownership check | Multi-tenant SaaS | /site/multi-tenant-saas/ |
| New integration with leaked secret | Indie SaaS, Agent app | /site/indie-saas/, /site/agent-app/ |
| New form without input validation | LLM-rendered HTML | /site/llm-rendered-html-markdown/ |
| Clean control (added tables without regression) | ref-rls | /site/ref-rls/ |
For the structural argument behind why generator-driven schemas regress on every iteration unless the access-control step is templated, see The Supabase service-role key in your frontend bundle and BOLA in AI-generated CRUD.
Citations
VibeEval. One Feature, One Regression: A Longitudinal Study of Lovable App Security. May 2026. https://vibe-eval.com/data-studies/lovable-regression-longitudinal-study/
Related
- Pattern walkthrough: BOLA in AI-generated CRUD — the modal regression class for new API routes
- Pattern walkthrough: The Supabase service-role key in your frontend bundle
- Pattern walkthrough: Mass assignment — common companion to the new-permission-tier regression
- Data study: 2026 AI App Security Benchmark
- Data study: Supabase RLS in the Wild
- Data study: Lovable vs Bolt vs Cursor — Same Spec — the snapshot before the regression curve starts
- Data study: Honeypot Supabase — time-to-abuse measured from the attacker side
- Safety review: Is Lovable Safe?
- Guide: Is My Lovable App Secure? Builder Checklist
RUN IT YOURSELF
Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.
curl -s 'https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/line_items?select=*' -H 'apikey: ANON_KEY'
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
curl -s https://gapbench.vibe-eval.com/site/agent-app/ | grep -oE 'sk-(proj-)?[A-Za-z0-9_-]{40,}'
curl -s 'https://gapbench.vibe-eval.com/site/ref-rls/rest/v1/line_items?select=*' -H 'apikey: ANON_KEY'
COMMON QUESTIONS
DETECT REGRESSIONS AS THEY HAPPEN
VibeEval re-scans on every deploy and alerts on new findings. Catch the regression before users do.