VIBE PENTESTING: SECURITY TESTING FOR VIBE-CODED APPS
Vibe pentesting is penetration testing tailored to apps built with AI coding tools. The generator matters more than the stack, because the failure modes are consistent across every project produced by the same tool.
What is vibe pentesting?
Vibe pentesting is penetration testing tailored to apps built with AI coding tools — the class of software produced by Lovable, Bolt, Cursor, Claude Code, v0, Replit, Base44, Figma Make, and Windsurf. It narrows the scope from general-purpose pentesting to the specific failure modes these tools consistently ship, which means faster turnaround, lower cost, and coverage of the issues most likely to bite a vibe-coded app in production.
Vibe-coded apps don’t fail randomly. They fail in patterns. The generator determines the pattern. That is what makes this kind of pentesting scannable.
Why traditional pentesting misses the target for vibe-coded apps
Traditional pentesting assumes the code was written by humans with inconsistent, independently-invented decisions. That is the right model for most software. It is the wrong model for AI-generated apps, where the same tool produces the same mistakes across every project.
A traditional pentest engagement takes one to three weeks, costs $5,000–$50,000, and delivers a report that spans OWASP Top 10, business logic, and edge cases. For a vibe-coded MVP that changed last week and will change again tomorrow, that cadence is wrong: by the time the report lands, the app is different.
Vibe pentesting inverts the tradeoff. Narrow scope, seconds-long runtime, continuous re-scan on every deploy. For the 95% of issues that matter in vibe-coded apps, it is the right shape of testing.
The vibe pentest checklist
The short list of issues that actually land in production vibe-coded apps:
- Missing Row Level Security on Supabase or Firebase tables — the single most common and most severe finding
- Exposed API keys in the frontend bundle (Stripe secret, Firebase service account, OpenAI, Anthropic, AWS)
- BOLA / IDOR on generated CRUD endpoints — change an ID, read someone else’s data
- Auth flows that verify the user but skip role or ownership checks
- Open storage buckets on Supabase Storage, Firebase Storage, or S3
- Permissive CORS (
*with credentials) on endpoints returning sensitive data - Debug routes or admin panels that shipped to production
- Webhook endpoints without signature verification
- Input validation skipped — SQL injection, XSS, prompt injection
- Missing security headers — CSP, HSTS, X-Frame-Options
Every finding from that list is scannable from outside the app. That’s why vibe pentesting can be automated in a way traditional pentesting cannot.
How a vibe pentest runs
- Scope — paste the deployed URL. No access credentials, no source code access, no agent to install.
- Recon — the scanner loads the app in a headless browser, captures every asset and request, and maps the API surface.
- Probe — each endpoint is tested for the checklist above, with evidence captured for every finding.
- Report — findings ranked by severity, each with evidence, exploitation notes, and a fix prompt for your AI coding tool.
- Rescan — re-run after fixes ship to verify nothing regressed.
The 14-pattern vibe-coding bug taxonomy
Vibe-coded apps fail along a known list of patterns. The full taxonomy is in Vibe-Coding Vulnerabilities; the shortlist that drives a vibe pentest:
- Missing RLS / Firestore Rules — the biggest one, by impact and frequency.
- Service-role key in the bundle — total database compromise in one finding.
- Hardcoded admin email gating — frontend-only, server has no idea.
- Mass assignment on profile / settings updates —
role: adminin a PATCH body. - BOLA on generated CRUD — change the ID, read the next person’s row.
- Public storage buckets — anonymous list, anonymous read, sometimes anonymous upload.
- Webhook without signature verification — Stripe, Resend, Linear, GitHub.
- LLM proxy without prompt-injection guardrails — system prompt leaks, tool poisoning.
- Hallucinated dependencies — npm packages that do not exist, squatted by attackers.
- Source maps in production — exposes the full unminified codebase including comments.
- Verbose error responses — stack traces, env var names, sometimes raw secrets.
- Permissive CORS with credentials — every site can read your authenticated responses.
- JWT signed with a leaked or weak secret — forge admin tokens trivially.
- Local storage as session store — XSS impact compounds dramatically.
A vibe pentest probes for every pattern. The scanner-grade subset is also covered by Vibe Code Scanner, the Token Leak Checker, the Supabase RLS Checker, the Firebase Scanner, the Package Hallucination Scanner, and the Security Headers Checker.
Vibe pentest vs traditional pentest
| Aspect | Vibe pentest | Traditional pentest |
|---|---|---|
| Scope | AI-generated-app failure modes | General-purpose, OWASP + business logic |
| Duration | 1–3 minutes | 1–3 weeks |
| Cost | Free / low subscription | $5k–$50k per engagement |
| Cadence | On every deploy | Usually annual |
| Driven by | Automated agent | Human pentester |
| Best for | Pre-launch + continuous coverage on vibe-coded apps | Regulated workloads, complex business logic, compliance |
The two are not substitutes. For anything touching sensitive data, run a vibe pentest continuously and a human pentest annually. For unregulated MVPs, continuous vibe pentesting alone is the pragmatic floor.
Per-tool variants
The methodology is the same. The fingerprint, the failure shape, and the fix-prompt target differ per tool. Below are the things a vibe pentest does differently for each.
Lovable
- Stack: React + Supabase. Sometimes Lovable Cloud (Edge Functions) or Lovable Connect (third-party integrations).
- Where it fails: RLS on tables added in follow-up prompts. Edge Functions that run as service-role with no auth check. Webhook handlers without signature verification.
- What the pentest does differently: enumerates Supabase tables via the PostgREST OpenAPI endpoint. Fingerprints
lovable-tagger. Tests every Edge Function URL as anon and as non-admin user. - Fix-prompt target: Lovable’s chat interface or a connected Cursor / Claude Code session. See Lovable Pentesting and Lovable Safety Guide.
Cursor
- Stack: Anything — Cursor is a code editor, not a generator with a fixed stack. Most common: Next.js + Postgres/Supabase, or Python FastAPI.
- Where it fails: the AI happily writes code that uses environment variables but never tells the user which variables are server-only. Secrets ship to client bundles via misuse of
NEXT_PUBLIC_*. SSRF in fetch-URL features. Mass assignment. - What the pentest does differently: harder fingerprint (no telltale artifact), so the scanner falls back on framework detection. Runs the full vibe checklist with no Cursor-specific shortcut.
- Fix-prompt target: Cursor itself, with the finding pasted into the chat alongside
@Codebasefor context. See Cursor Safety Guide.
Bolt (StackBlitz)
- Stack: Vite + React/Vue + chosen backend. Often deployed via Bolt’s hosted preview or to Netlify/Vercel.
- Where it fails: preview deployments left public with debug toggles. CORS set to
*because Bolt’s local dev needed it and no one closed it down for prod. Hardcoded mock data that is actually real test PII. - What the pentest does differently: specifically checks Bolt-style preview URLs (
*.stackblitz.io,*.bolt.new) for production-grade configuration. Looks for debug routes (/__debug,/_dev). - Fix-prompt target: Bolt’s chat. See Bolt Safety Guide.
v0 (Vercel)
- Stack: Next.js + Tailwind, almost always deployed to Vercel. Often paired with Supabase or Postgres on Neon.
- Where it fails: server actions without authentication checks. Server components that fetch with a server-only API key but expose the result to the client unfiltered. Open Vercel preview URLs that index in Google.
- What the pentest does differently: specifically checks Vercel preview/branch URLs and
vercel.appsubdomains. Tests every server action URL (/_next/...) as anon. Looks for indexed preview deployments via search. - Fix-prompt target: v0’s chat or paste into Cursor/Claude Code on the cloned repo. See v0 Safety Guide.
Replit
- Stack: Anything — Replit hosts Python, Node, Go, Ruby. Replit Database is the closest thing to a default backend. Many Replit apps deploy to repl.co subdomains.
- Where it fails: Replit Database has no RLS concept; every key is readable by code that has the secret. Apps frequently expose admin endpoints under
/adminwith no auth at all. Replit Agent sometimes hardcodes secrets directly into source. - What the pentest does differently: harder to test backends that are not Supabase or Firebase, so the scanner relies on direct endpoint probing. Looks for repl.co subdomain patterns and applies higher suspicion to debug routes.
- Fix-prompt target: Replit Agent or the editor. See Replit Safety Guide.
Claude Code
- Stack: Anything. Claude Code is a CLI agent — the user picks the stack.
- Where it fails: Claude Code is the most “what you asked for” of the AI tools — it implements exactly what the prompt says, including the security shortcuts the prompt did not exclude. Apps generated with vague security requirements ship vague security. Specific failures: Edge Functions that work locally but were never deployed with the right env vars, leading to “the auth check returns true if the env is missing” patterns.
- What the pentest does differently: runs the full checklist, no shortcut. Particular attention to the integration layer (third-party APIs, webhooks) because Claude Code is most often used to wire integrations.
- Fix-prompt target: Claude Code itself. See Claude Code Safety Guide.
Windsurf
- Stack: Anything (Codeium’s editor with agentic features). Most common: Next.js or Python.
- Where it fails: similar to Cursor — context-blindness on which files run server vs client. Cascade-mode multi-file changes that introduce secret leakage across modules.
- What the pentest does differently: runs the full checklist. Looks for cascade-pattern leakage (a secret defined in one file and imported into a client-side file).
- Fix-prompt target: Windsurf Cascade. See Windsurf Safety Guide.
GitHub Copilot
- Stack: Anything. Copilot is autocomplete + Workspace + agent modes inside the IDE.
- Where it fails: silently accepts insecure defaults from the surrounding code. If your codebase has one anti-pattern (skipping CSRF, no input validation), Copilot will replicate it across every new file.
- What the pentest does differently: treats Copilot-built apps as “human + a force-multiplier for whatever the human’s existing patterns are.” Runs the full checklist with extra weight on consistency checks (the same bug repeated across endpoints).
- Fix-prompt target: Copilot Chat or directly in the editor. See GitHub Copilot Safety Guide.
Devin
- Stack: Anything. Devin is an agentic engineer that takes a ticket and ships a PR.
- Where it fails: Devin generates code that passes its own tests but does not threat-model the change. Ships features with the security bugs already discussed in the ticket as features rather than as bugs. Particularly weak on secret-management discipline.
- What the pentest does differently: because Devin shipped a PR, the diff is auditable. The pentest applies the full checklist and pays special attention to any new endpoint, new env var, or new dependency introduced by the PR.
- Fix-prompt target: Devin in a follow-up ticket. See Devin Safety Guide.
Vibe pentest in CI/CD
The right place to run a vibe pentest is on every deploy. The wiring:
# .github/workflows/security.yml — sketch, not a full file
name: vibe-pentest
on:
deployment_status:
types: [success]
jobs:
pentest:
runs-on: ubuntu-latest
steps:
- name: Run vibe pentest
run: |
curl -sS -X POST https://api.vibe-eval.com/scan \
-H "Authorization: Bearer ${{ secrets.VIBEEVAL_KEY }}" \
-d '{"url":"${{ github.event.deployment_status.target_url }}"}' \
> scan.json
- name: Block on Critical findings
run: |
test $(jq '.findings[] | select(.severity=="Critical") | length' scan.json) -eq 0
Pattern: deploy succeeds → scan fires → fail the workflow if any Critical lands. Notify on Slack with the scan URL. Critical issues block the next deploy until rescan passes.
For larger teams: gate the deploy itself rather than the post-deploy alert. Scan the staging environment first; only promote to prod after a clean scan.
“AI builds it, AI pentests it, AI fixes it” feedback loop
The right loop is now end-to-end agentic:
- AI tool builds the app. Lovable, Cursor, Claude Code, v0 — pick one.
- AI agent pentests the deployed URL. The vibe pentest from this article.
- AI fix prompt is generated for each finding. Severity, evidence, root cause, the exact prompt to paste back into the originating tool.
- AI tool patches. Paste the fix prompt. Apply the diff. Deploy.
- AI agent rescans. Verify the finding is closed. New findings introduced by the patch are also caught.
- Loop continues. On every subsequent deploy.
The economic insight: every step is now sub-dollar in cost and sub-minute in latency. The human’s role moves from “find and fix bugs” to “review the patches and ship them.” The bottleneck is review, not work.
The risk: if step 3 generates a bad fix prompt, step 4 ships a worse bug. That is why every patch should still pass through a human review on the first iteration, even if subsequent iterations are auto-merged. See Integration Layer Is the Real Security Gap for the case for human gates on integration code.
Anonymized findings — what vibe pentests catch
A representative cross-section of findings from vibe-coded apps we audit. Each is generalized.
Finding A — Anon-readable users table on a Cursor + Supabase app
- Endpoint:
/rest/v1/users - Evidence: Anonymous GET returned every row including email, password reset tokens, last login.
- Impact: Full PII disclosure for every user.
- Fix: Enable RLS, write per-user policies, audit every other table the same way.
Finding B — OpenAI key leaked through a v0 server action error
- Endpoint:
POST /api/chat(Next.js route handler) - Evidence: Sending a malformed body crashed the handler. The error response body included the raw exception, which contained
Authorization: Bearer sk-ant-...from the upstream fetch. - Impact: Stolen API key, attacker-funded LLM access.
- Fix: Wrap the upstream fetch in a try/catch, return a generic 500 with no detail, log the actual error server-side only.
Finding C — Stripe webhook unverified on a Lovable Edge Function
- Endpoint:
POST /functions/v1/stripe-webhook - Evidence: The function parsed
event.typefrom the JSON body and updated user subscription state. No signature verification. - Impact: Forged events upgrade any user to paid plan.
- Fix: Use
stripe.webhooks.constructEventwith the raw body and the webhook secret. Add a regression test.
Finding D — Hardcoded admin email list in a Bolt + React app
- Endpoint:
/admin/*routes and/api/admin/*handlers - Evidence: Frontend bundle contained
const ADMINS = ['founder@example.com']and gated routes on it. The corresponding API handlers performed no server-side check. - Impact: Any authenticated user can hit admin APIs directly with curl.
- Fix: Move admin status to a server-validated
rolefield. Check on every admin endpoint.
Finding E — Public Firebase Storage bucket on a Replit + Firebase app
- Endpoint:
https://firebasestorage.googleapis.com/v0/b/<bucket>/o/ - Evidence: Anonymous list returned every uploaded file. Filenames followed
<user_email>_<timestamp>.pngso PII was leaking even from the listing. - Impact: Full file enumeration, PII through filenames.
- Fix: Update Firestore Storage Rules to require auth on read and to scope writes by
request.auth.uid. Use unguessable filenames.
Finding F — Mass assignment on a Windsurf + Express app
- Endpoint:
PATCH /api/profile - Evidence: Handler did
User.update(req.body)without field whitelist. POST with{"role": "admin"}upgraded the requester. - Impact: Any user becomes admin.
- Fix: Whitelist updatable fields. Treat
role,id,email_verifiedas system-managed.
Finding G — Hallucinated dependency in a Claude Code project
- Endpoint: N/A (build-time)
- Evidence:
package.jsoncontained a dependency name that did not exist on npm. The build worked locally because the resolution failed silently and the import was unused. After someone published a package with the same name, every fresh install would have pulled attacker code. - Impact: Future supply chain compromise.
- Fix: Run Package Hallucination Scanner on every commit. Pin versions. Enable
npm audit signatures.
Finding H — Source map exposed on a Lovable production build
- Endpoint:
https://app.example.com/assets/index-abc.js.map - Evidence: Source map deployed to production. Full unminified source recoverable, including comments referencing internal API design and a placeholder
// TODO: replace this hardcoded JWT secret. - Impact: Trivial reverse engineering plus secret leak through committed comments.
- Fix: Disable source map generation for production builds, or upload to an authenticated error-tracking service rather than the public CDN.
Kill chain on a vibe-coded app
Anonymized but representative.
Step 1 — Recon on a Lovable app. Bundle inspected. Supabase URL extracted. PostgREST schema fetched at /rest/v1/. 23 tables discovered.
Step 2 — RLS sweep. Anonymous SELECT against each table. 21 return empty. Two — feedback_messages and support_replies — return rows.
Step 3 — PII triage. feedback_messages contains user email, message body, and user IP. support_replies contains the admin’s response, which sometimes includes account-recovery links.
Step 4 — Account recovery link extraction. The agent searches support_replies for URL patterns matching the app’s reset link template. Two valid, unexpired reset URLs are recovered.
Step 5 — Account takeover. The reset URL works (no IP binding, no second factor). The agent does not actually take over the account but reports the chain as a high-severity finding.
Step 6 — Lateral. With the (theoretical) takeover, the agent identifies the privilege escalation possibilities — admin role on profiles, organization owner on org_members — and includes them in the report as the next steps an attacker would take.
The chain is reported as one root cause (RLS missing on two tables) with three downstream impacts. The fix is surgical. The rescan would close the chain.
When you should NOT run a vibe pentest
- You haven’t shipped yet. Run Vibe Code Scanner on the staging URL first; the pentest is for deployed apps.
- You don’t own the target. No matter how curious you are about another team’s Lovable app, do not pentest it without written authorization. The AI agent will happily comply, but the law will not.
- The app is not web-shaped. CLI tools, native apps, on-device models — different methodology.
- You need a deep one-bug investigation. Vibe pentesting is broad coverage. For deep work on a specific class, hire a human researcher.
- You’re testing safety/alignment of an LLM, not security of an app. Different category — see AI red teaming discussion of LLM-specific testing.
Fix prompts you can paste into the AI tool that built the app
Generic vibe-pentest fix prompt
The vibe pentest report below identifies a security issue in this app.
Apply the fix described, and only that fix. Do not refactor unrelated code.
Add a regression test that proves the issue is closed. After the patch
ships, the rescan should not re-emit this finding.
<paste finding>
Lovable-specific RLS fix prompt
Enable Row Level Security on the table `<table>` and write four policies that
gate SELECT, INSERT, UPDATE, and DELETE on `auth.uid() = user_id`. Generate
the SQL migration. Do not break any existing service-role usage.
v0 / Next.js server action lockdown prompt
The server action at `<path>` is callable without authentication. Add an
auth check at the top of the action: read the session, return a 401 (or
redirect) if missing. Then add an authorization check: confirm the session
user owns the resource the action is touching. Add a regression test.
Cursor / Claude Code generic fix prompt
The endpoint `<METHOD> <route>` is vulnerable to <vuln class>. Apply the
fix below. Do not modify other endpoints in this PR.
Evidence: <copy-paste the request/response from the report>
Root cause: <copy-paste from the report>
Fix: <copy-paste the recommended fix>
Add a regression test in the existing test suite that asserts the
vulnerability is closed.
Bolt / Replit generic prompt
Production deploy of this app has the security issue described below.
Patch in place. Do not introduce new dependencies. Re-run the deploy when
the patch is in.
<finding>
Tool-specific vibe pentests
- Lovable Pentesting — Lovable-specific methodology and fixes
- Lovable Safety Guide — Lovable-specific failure modes
- Cursor Safety Guide — Cursor-specific failure modes
- Bolt Safety Guide — Bolt-specific failure modes
- v0 Safety Guide — v0-specific failure modes
- Replit Safety Guide — Replit-specific failure modes
- Claude Code Safety Guide — Claude Code-specific failure modes
- Windsurf Safety Guide — Windsurf-specific failure modes
- GitHub Copilot Safety Guide — Copilot-specific failure modes
- Devin Safety Guide — Devin-specific failure modes
- Supabase Safety Guide — Supabase-as-a-backend risks
- Firebase Safety Guide — Firebase-as-a-backend risks
- Cursor vs Claude Code Security — code-assistant security comparison
- Lovable Guides — Lovable-specific build guides
- Cursor Guides — Cursor-specific build guides
- Lovable Checklists — Lovable pre-launch checklist
- AI Pentesting Explained — what AI pentesting means and how it works
- AI Pentest vs Traditional — deeper comparison
- AI Pentest for Web Applications — broader web-app pentest scope
- AI Pentest for APIs — API-specific scope
- Continuous Penetration Testing — every-deploy cadence
- PTaaS — subscription pentest model
- AI Coding Tool Security Hub — vulnerability taxonomy across every AI coding tool
- Vibe-Coding Vulnerabilities — full 14-pattern taxonomy
- Vibe Code Scanner — run the free scanner
- Token Leak Checker — focused scan for exposed keys
- Supabase RLS Checker — Supabase-specific RLS audit
- Firebase Scanner — Firestore Rules audit
- Security Headers Checker — CSP, HSTS, CORS audit
- Package Hallucination Scanner — find AI-invented dependencies
- Best Security Scanner for AI Apps — category comparison
- VibeEval vs Burp Suite — AI agent vs manual pentest tool
- VibeEval vs OWASP ZAP — AI agent vs DAST scanner
- VibeEval vs Snyk — runtime AI pentest vs SAST + SCA
- Integration Layer Is the Real Security Gap — why integration code is the worst-tested layer
- OWASP Top 10 for AI Code — canonical failure modes
COMMON QUESTIONS
RUN A VIBE PENTEST
14-day trial. No card. Full agent-driven scan on your deployed URL in under 60 seconds.