VIBE PENTESTING: SECURITY TESTING FOR VIBE-CODED APPS

Vibe pentesting is penetration testing tailored to apps built with AI coding tools. The generator matters more than the stack, because the failure modes are consistent across every project produced by the same tool.

What is vibe pentesting?

Vibe pentesting is penetration testing tailored to apps built with AI coding tools — the class of software produced by Lovable, Bolt, Cursor, Claude Code, v0, Replit, Base44, Figma Make, and Windsurf. It narrows the scope from general-purpose pentesting to the specific failure modes these tools consistently ship, which means faster turnaround, lower cost, and coverage of the issues most likely to bite a vibe-coded app in production.

Vibe-coded apps don’t fail randomly. They fail in patterns. The generator determines the pattern. That is what makes this kind of pentesting scannable.

Why traditional pentesting misses the target for vibe-coded apps

Traditional pentesting assumes the code was written by humans with inconsistent, independently-invented decisions. That is the right model for most software. It is the wrong model for AI-generated apps, where the same tool produces the same mistakes across every project.

A traditional pentest engagement takes one to three weeks, costs $5,000–$50,000, and delivers a report that spans OWASP Top 10, business logic, and edge cases. For a vibe-coded MVP that changed last week and will change again tomorrow, that cadence is wrong: by the time the report lands, the app is different.

Vibe pentesting inverts the tradeoff. Narrow scope, seconds-long runtime, continuous re-scan on every deploy. For the 95% of issues that matter in vibe-coded apps, it is the right shape of testing.

The vibe pentest checklist

The short list of issues that actually land in production vibe-coded apps:

  1. Missing Row Level Security on Supabase or Firebase tables — the single most common and most severe finding
  2. Exposed API keys in the frontend bundle (Stripe secret, Firebase service account, OpenAI, Anthropic, AWS)
  3. BOLA / IDOR on generated CRUD endpoints — change an ID, read someone else’s data
  4. Auth flows that verify the user but skip role or ownership checks
  5. Open storage buckets on Supabase Storage, Firebase Storage, or S3
  6. Permissive CORS (* with credentials) on endpoints returning sensitive data
  7. Debug routes or admin panels that shipped to production
  8. Webhook endpoints without signature verification
  9. Input validation skipped — SQL injection, XSS, prompt injection
  10. Missing security headers — CSP, HSTS, X-Frame-Options

Every finding from that list is scannable from outside the app. That’s why vibe pentesting can be automated in a way traditional pentesting cannot.

How a vibe pentest runs

  1. Scope — paste the deployed URL. No access credentials, no source code access, no agent to install.
  2. Recon — the scanner loads the app in a headless browser, captures every asset and request, and maps the API surface.
  3. Probe — each endpoint is tested for the checklist above, with evidence captured for every finding.
  4. Report — findings ranked by severity, each with evidence, exploitation notes, and a fix prompt for your AI coding tool.
  5. Rescan — re-run after fixes ship to verify nothing regressed.

The 14-pattern vibe-coding bug taxonomy

Vibe-coded apps fail along a known list of patterns. The full taxonomy is in Vibe-Coding Vulnerabilities; the shortlist that drives a vibe pentest:

  1. Missing RLS / Firestore Rules — the biggest one, by impact and frequency.
  2. Service-role key in the bundle — total database compromise in one finding.
  3. Hardcoded admin email gating — frontend-only, server has no idea.
  4. Mass assignment on profile / settings updatesrole: admin in a PATCH body.
  5. BOLA on generated CRUD — change the ID, read the next person’s row.
  6. Public storage buckets — anonymous list, anonymous read, sometimes anonymous upload.
  7. Webhook without signature verification — Stripe, Resend, Linear, GitHub.
  8. LLM proxy without prompt-injection guardrails — system prompt leaks, tool poisoning.
  9. Hallucinated dependencies — npm packages that do not exist, squatted by attackers.
  10. Source maps in production — exposes the full unminified codebase including comments.
  11. Verbose error responses — stack traces, env var names, sometimes raw secrets.
  12. Permissive CORS with credentials — every site can read your authenticated responses.
  13. JWT signed with a leaked or weak secret — forge admin tokens trivially.
  14. Local storage as session store — XSS impact compounds dramatically.

A vibe pentest probes for every pattern. The scanner-grade subset is also covered by Vibe Code Scanner, the Token Leak Checker, the Supabase RLS Checker, the Firebase Scanner, the Package Hallucination Scanner, and the Security Headers Checker.

Vibe pentest vs traditional pentest

Aspect Vibe pentest Traditional pentest
Scope AI-generated-app failure modes General-purpose, OWASP + business logic
Duration 1–3 minutes 1–3 weeks
Cost Free / low subscription $5k–$50k per engagement
Cadence On every deploy Usually annual
Driven by Automated agent Human pentester
Best for Pre-launch + continuous coverage on vibe-coded apps Regulated workloads, complex business logic, compliance

The two are not substitutes. For anything touching sensitive data, run a vibe pentest continuously and a human pentest annually. For unregulated MVPs, continuous vibe pentesting alone is the pragmatic floor.

Per-tool variants

The methodology is the same. The fingerprint, the failure shape, and the fix-prompt target differ per tool. Below are the things a vibe pentest does differently for each.

Lovable

  • Stack: React + Supabase. Sometimes Lovable Cloud (Edge Functions) or Lovable Connect (third-party integrations).
  • Where it fails: RLS on tables added in follow-up prompts. Edge Functions that run as service-role with no auth check. Webhook handlers without signature verification.
  • What the pentest does differently: enumerates Supabase tables via the PostgREST OpenAPI endpoint. Fingerprints lovable-tagger. Tests every Edge Function URL as anon and as non-admin user.
  • Fix-prompt target: Lovable’s chat interface or a connected Cursor / Claude Code session. See Lovable Pentesting and Lovable Safety Guide.

Cursor

  • Stack: Anything — Cursor is a code editor, not a generator with a fixed stack. Most common: Next.js + Postgres/Supabase, or Python FastAPI.
  • Where it fails: the AI happily writes code that uses environment variables but never tells the user which variables are server-only. Secrets ship to client bundles via misuse of NEXT_PUBLIC_*. SSRF in fetch-URL features. Mass assignment.
  • What the pentest does differently: harder fingerprint (no telltale artifact), so the scanner falls back on framework detection. Runs the full vibe checklist with no Cursor-specific shortcut.
  • Fix-prompt target: Cursor itself, with the finding pasted into the chat alongside @Codebase for context. See Cursor Safety Guide.

Bolt (StackBlitz)

  • Stack: Vite + React/Vue + chosen backend. Often deployed via Bolt’s hosted preview or to Netlify/Vercel.
  • Where it fails: preview deployments left public with debug toggles. CORS set to * because Bolt’s local dev needed it and no one closed it down for prod. Hardcoded mock data that is actually real test PII.
  • What the pentest does differently: specifically checks Bolt-style preview URLs (*.stackblitz.io, *.bolt.new) for production-grade configuration. Looks for debug routes (/__debug, /_dev).
  • Fix-prompt target: Bolt’s chat. See Bolt Safety Guide.

v0 (Vercel)

  • Stack: Next.js + Tailwind, almost always deployed to Vercel. Often paired with Supabase or Postgres on Neon.
  • Where it fails: server actions without authentication checks. Server components that fetch with a server-only API key but expose the result to the client unfiltered. Open Vercel preview URLs that index in Google.
  • What the pentest does differently: specifically checks Vercel preview/branch URLs and vercel.app subdomains. Tests every server action URL (/_next/...) as anon. Looks for indexed preview deployments via search.
  • Fix-prompt target: v0’s chat or paste into Cursor/Claude Code on the cloned repo. See v0 Safety Guide.

Replit

  • Stack: Anything — Replit hosts Python, Node, Go, Ruby. Replit Database is the closest thing to a default backend. Many Replit apps deploy to repl.co subdomains.
  • Where it fails: Replit Database has no RLS concept; every key is readable by code that has the secret. Apps frequently expose admin endpoints under /admin with no auth at all. Replit Agent sometimes hardcodes secrets directly into source.
  • What the pentest does differently: harder to test backends that are not Supabase or Firebase, so the scanner relies on direct endpoint probing. Looks for repl.co subdomain patterns and applies higher suspicion to debug routes.
  • Fix-prompt target: Replit Agent or the editor. See Replit Safety Guide.

Claude Code

  • Stack: Anything. Claude Code is a CLI agent — the user picks the stack.
  • Where it fails: Claude Code is the most “what you asked for” of the AI tools — it implements exactly what the prompt says, including the security shortcuts the prompt did not exclude. Apps generated with vague security requirements ship vague security. Specific failures: Edge Functions that work locally but were never deployed with the right env vars, leading to “the auth check returns true if the env is missing” patterns.
  • What the pentest does differently: runs the full checklist, no shortcut. Particular attention to the integration layer (third-party APIs, webhooks) because Claude Code is most often used to wire integrations.
  • Fix-prompt target: Claude Code itself. See Claude Code Safety Guide.

Windsurf

  • Stack: Anything (Codeium’s editor with agentic features). Most common: Next.js or Python.
  • Where it fails: similar to Cursor — context-blindness on which files run server vs client. Cascade-mode multi-file changes that introduce secret leakage across modules.
  • What the pentest does differently: runs the full checklist. Looks for cascade-pattern leakage (a secret defined in one file and imported into a client-side file).
  • Fix-prompt target: Windsurf Cascade. See Windsurf Safety Guide.

GitHub Copilot

  • Stack: Anything. Copilot is autocomplete + Workspace + agent modes inside the IDE.
  • Where it fails: silently accepts insecure defaults from the surrounding code. If your codebase has one anti-pattern (skipping CSRF, no input validation), Copilot will replicate it across every new file.
  • What the pentest does differently: treats Copilot-built apps as “human + a force-multiplier for whatever the human’s existing patterns are.” Runs the full checklist with extra weight on consistency checks (the same bug repeated across endpoints).
  • Fix-prompt target: Copilot Chat or directly in the editor. See GitHub Copilot Safety Guide.

Devin

  • Stack: Anything. Devin is an agentic engineer that takes a ticket and ships a PR.
  • Where it fails: Devin generates code that passes its own tests but does not threat-model the change. Ships features with the security bugs already discussed in the ticket as features rather than as bugs. Particularly weak on secret-management discipline.
  • What the pentest does differently: because Devin shipped a PR, the diff is auditable. The pentest applies the full checklist and pays special attention to any new endpoint, new env var, or new dependency introduced by the PR.
  • Fix-prompt target: Devin in a follow-up ticket. See Devin Safety Guide.

Vibe pentest in CI/CD

The right place to run a vibe pentest is on every deploy. The wiring:

# .github/workflows/security.yml — sketch, not a full file
name: vibe-pentest
on:
  deployment_status:
    types: [success]

jobs:
  pentest:
    runs-on: ubuntu-latest
    steps:
      - name: Run vibe pentest
        run: |
          curl -sS -X POST https://api.vibe-eval.com/scan \
            -H "Authorization: Bearer ${{ secrets.VIBEEVAL_KEY }}" \
            -d '{"url":"${{ github.event.deployment_status.target_url }}"}' \
            > scan.json
      - name: Block on Critical findings
        run: |
          test $(jq '.findings[] | select(.severity=="Critical") | length' scan.json) -eq 0

Pattern: deploy succeeds → scan fires → fail the workflow if any Critical lands. Notify on Slack with the scan URL. Critical issues block the next deploy until rescan passes.

For larger teams: gate the deploy itself rather than the post-deploy alert. Scan the staging environment first; only promote to prod after a clean scan.

“AI builds it, AI pentests it, AI fixes it” feedback loop

The right loop is now end-to-end agentic:

  1. AI tool builds the app. Lovable, Cursor, Claude Code, v0 — pick one.
  2. AI agent pentests the deployed URL. The vibe pentest from this article.
  3. AI fix prompt is generated for each finding. Severity, evidence, root cause, the exact prompt to paste back into the originating tool.
  4. AI tool patches. Paste the fix prompt. Apply the diff. Deploy.
  5. AI agent rescans. Verify the finding is closed. New findings introduced by the patch are also caught.
  6. Loop continues. On every subsequent deploy.

The economic insight: every step is now sub-dollar in cost and sub-minute in latency. The human’s role moves from “find and fix bugs” to “review the patches and ship them.” The bottleneck is review, not work.

The risk: if step 3 generates a bad fix prompt, step 4 ships a worse bug. That is why every patch should still pass through a human review on the first iteration, even if subsequent iterations are auto-merged. See Integration Layer Is the Real Security Gap for the case for human gates on integration code.

Anonymized findings — what vibe pentests catch

A representative cross-section of findings from vibe-coded apps we audit. Each is generalized.

Finding A — Anon-readable users table on a Cursor + Supabase app

  • Endpoint: /rest/v1/users
  • Evidence: Anonymous GET returned every row including email, password reset tokens, last login.
  • Impact: Full PII disclosure for every user.
  • Fix: Enable RLS, write per-user policies, audit every other table the same way.

Finding B — OpenAI key leaked through a v0 server action error

  • Endpoint: POST /api/chat (Next.js route handler)
  • Evidence: Sending a malformed body crashed the handler. The error response body included the raw exception, which contained Authorization: Bearer sk-ant-... from the upstream fetch.
  • Impact: Stolen API key, attacker-funded LLM access.
  • Fix: Wrap the upstream fetch in a try/catch, return a generic 500 with no detail, log the actual error server-side only.

Finding C — Stripe webhook unverified on a Lovable Edge Function

  • Endpoint: POST /functions/v1/stripe-webhook
  • Evidence: The function parsed event.type from the JSON body and updated user subscription state. No signature verification.
  • Impact: Forged events upgrade any user to paid plan.
  • Fix: Use stripe.webhooks.constructEvent with the raw body and the webhook secret. Add a regression test.

Finding D — Hardcoded admin email list in a Bolt + React app

  • Endpoint: /admin/* routes and /api/admin/* handlers
  • Evidence: Frontend bundle contained const ADMINS = ['founder@example.com'] and gated routes on it. The corresponding API handlers performed no server-side check.
  • Impact: Any authenticated user can hit admin APIs directly with curl.
  • Fix: Move admin status to a server-validated role field. Check on every admin endpoint.

Finding E — Public Firebase Storage bucket on a Replit + Firebase app

  • Endpoint: https://firebasestorage.googleapis.com/v0/b/<bucket>/o/
  • Evidence: Anonymous list returned every uploaded file. Filenames followed <user_email>_<timestamp>.png so PII was leaking even from the listing.
  • Impact: Full file enumeration, PII through filenames.
  • Fix: Update Firestore Storage Rules to require auth on read and to scope writes by request.auth.uid. Use unguessable filenames.

Finding F — Mass assignment on a Windsurf + Express app

  • Endpoint: PATCH /api/profile
  • Evidence: Handler did User.update(req.body) without field whitelist. POST with {"role": "admin"} upgraded the requester.
  • Impact: Any user becomes admin.
  • Fix: Whitelist updatable fields. Treat role, id, email_verified as system-managed.

Finding G — Hallucinated dependency in a Claude Code project

  • Endpoint: N/A (build-time)
  • Evidence: package.json contained a dependency name that did not exist on npm. The build worked locally because the resolution failed silently and the import was unused. After someone published a package with the same name, every fresh install would have pulled attacker code.
  • Impact: Future supply chain compromise.
  • Fix: Run Package Hallucination Scanner on every commit. Pin versions. Enable npm audit signatures.

Finding H — Source map exposed on a Lovable production build

  • Endpoint: https://app.example.com/assets/index-abc.js.map
  • Evidence: Source map deployed to production. Full unminified source recoverable, including comments referencing internal API design and a placeholder // TODO: replace this hardcoded JWT secret.
  • Impact: Trivial reverse engineering plus secret leak through committed comments.
  • Fix: Disable source map generation for production builds, or upload to an authenticated error-tracking service rather than the public CDN.

Kill chain on a vibe-coded app

Anonymized but representative.

Step 1 — Recon on a Lovable app. Bundle inspected. Supabase URL extracted. PostgREST schema fetched at /rest/v1/. 23 tables discovered.

Step 2 — RLS sweep. Anonymous SELECT against each table. 21 return empty. Two — feedback_messages and support_replies — return rows.

Step 3 — PII triage. feedback_messages contains user email, message body, and user IP. support_replies contains the admin’s response, which sometimes includes account-recovery links.

Step 4 — Account recovery link extraction. The agent searches support_replies for URL patterns matching the app’s reset link template. Two valid, unexpired reset URLs are recovered.

Step 5 — Account takeover. The reset URL works (no IP binding, no second factor). The agent does not actually take over the account but reports the chain as a high-severity finding.

Step 6 — Lateral. With the (theoretical) takeover, the agent identifies the privilege escalation possibilities — admin role on profiles, organization owner on org_members — and includes them in the report as the next steps an attacker would take.

The chain is reported as one root cause (RLS missing on two tables) with three downstream impacts. The fix is surgical. The rescan would close the chain.

When you should NOT run a vibe pentest

  • You haven’t shipped yet. Run Vibe Code Scanner on the staging URL first; the pentest is for deployed apps.
  • You don’t own the target. No matter how curious you are about another team’s Lovable app, do not pentest it without written authorization. The AI agent will happily comply, but the law will not.
  • The app is not web-shaped. CLI tools, native apps, on-device models — different methodology.
  • You need a deep one-bug investigation. Vibe pentesting is broad coverage. For deep work on a specific class, hire a human researcher.
  • You’re testing safety/alignment of an LLM, not security of an app. Different category — see AI red teaming discussion of LLM-specific testing.

Fix prompts you can paste into the AI tool that built the app

Generic vibe-pentest fix prompt

The vibe pentest report below identifies a security issue in this app.
Apply the fix described, and only that fix. Do not refactor unrelated code.
Add a regression test that proves the issue is closed. After the patch
ships, the rescan should not re-emit this finding.

<paste finding>

Lovable-specific RLS fix prompt

Enable Row Level Security on the table `<table>` and write four policies that
gate SELECT, INSERT, UPDATE, and DELETE on `auth.uid() = user_id`. Generate
the SQL migration. Do not break any existing service-role usage.

v0 / Next.js server action lockdown prompt

The server action at `<path>` is callable without authentication. Add an
auth check at the top of the action: read the session, return a 401 (or
redirect) if missing. Then add an authorization check: confirm the session
user owns the resource the action is touching. Add a regression test.

Cursor / Claude Code generic fix prompt

The endpoint `<METHOD> <route>` is vulnerable to <vuln class>. Apply the
fix below. Do not modify other endpoints in this PR.

Evidence: <copy-paste the request/response from the report>
Root cause: <copy-paste from the report>
Fix: <copy-paste the recommended fix>

Add a regression test in the existing test suite that asserts the
vulnerability is closed.

Bolt / Replit generic prompt

Production deploy of this app has the security issue described below.
Patch in place. Do not introduce new dependencies. Re-run the deploy when
the patch is in.

<finding>

Tool-specific vibe pentests

COMMON QUESTIONS

01
What is vibe pentesting?
Vibe pentesting is penetration testing focused on the specific failure modes of AI-generated web apps — the kind produced by Lovable, Bolt, Cursor, Claude Code, v0, Replit, and Windsurf. It prioritizes the predictable gaps these tools create (missing RLS, exposed keys, BOLA, skipped input validation) over the full OWASP coverage a traditional pentest would attempt.
Q&A
02
How is vibe pentesting different from a regular pentest?
A traditional pentest is scoped against any vulnerability class an attacker might exploit. A vibe pentest is scoped against the attack surface AI coding tools generate — which turns out to be a fairly small, repeatable list. The trade-off is narrower coverage in exchange for faster turnaround (minutes vs weeks) and lower cost.
Q&A
03
What vulnerabilities does a vibe pentest look for?
The core checklist: missing Row Level Security on Supabase or Firebase tables, exposed API keys in the frontend bundle, BOLA/IDOR on generated CRUD endpoints, auth flows that skip role or ownership checks, open storage buckets, permissive CORS, debug routes in production, and webhook endpoints without signature verification.
Q&A
04
Can a vibe pentest replace a traditional pentest?
For pre-launch and continuous coverage of vibe-coded apps, yes. For regulated workloads (HIPAA, SOC 2, PCI) a human-led pentest is still required by auditors. The recommended pattern is continuous vibe pentesting as the default floor, with an annual human pentest on top.
Q&A
05
Who should run a vibe pentest?
Any team shipping a Lovable, Bolt, Cursor, Claude Code, v0, Replit, or Windsurf app to production — especially solo founders and small teams who don't have a security engineer. The scan is designed to be run by the builder, not a security specialist.
Q&A
06
How fast is a vibe pentest?
A typical scan completes in one to three minutes against a deployed URL. There is no agent to install, no code to share, and no access credentials to grant — the pentest runs from outside the app, the way an attacker would.
Q&A
07
What do I do with the findings?
Every finding ships with a fix prompt that can be pasted into the same AI coding tool you built the app with. Critical issues (exposed secret keys, anon-readable user tables) should be fixed before anything else. The rescan flow verifies the patch.
Q&A
08
Why do AI coding tools keep making the same security mistakes?
Because the training data is full of them. The most common React + Supabase tutorial does not enable RLS. The most common Stripe webhook example skips signature verification. The model learns the pattern that ships, not the pattern that is correct. Until the training distribution shifts, every new AI-built app starts from the same insecure baseline. Vibe pentesting is the runtime correction layer.
Q&A
09
Is vibe pentesting just running a DAST scanner?
No. A DAST scanner is signature-driven — it tries the same payloads against everything. A vibe pentest is agentic — it adapts to the specific app, knows that 'every Lovable app uses Supabase' and 'every v0 app uses Vercel deployments', and it chains findings into kill chains. Same runtime, completely different planner.
Q&A
10
Does vibe pentesting cover non-web targets?
Today, no. The methodology is web-app-shaped: deployed URL, JS bundle, REST/GraphQL/WebSocket API, browser-side state. CLI tools, native mobile apps, and agentic backends are adjacent categories with different methodology. We expect the boundary to shift over time.
Q&A
11
How does a vibe pentest agent know which tool built the app?
Fingerprinting. Lovable apps have a `lovable-tagger` artifact and load Supabase from a fixed origin pattern. v0 apps deploy under a v0-recognizable Vercel project pattern and ship a v0 footer in dev. Bolt outputs a recognizable filesystem fingerprint in StackBlitz embedded mode. Replit apps deploy under repl.co subdomains. The scanner detects the generator and applies the matching playbook.
Q&A
12
Can the AI tool that built my app also fix the findings?
Yes — and that is the intended workflow. Every finding ships with a fix prompt designed for the originating tool. Lovable findings paste back into Lovable; Cursor findings paste into Cursor; Claude Code findings into Claude Code. The loop closes: AI builds, AI pentests, AI fixes, AI verifies the rescan.
Q&A
13
What happens if I do not run a vibe pentest before launching?
The most common outcome we see: a few weeks after launch, someone curls the Supabase REST endpoint without auth and dumps the user table. Sometimes it is a security researcher reporting it; sometimes it is a competitor or a malicious actor. The cost of running the scan is minutes; the cost of the alternative is the GDPR fine, the breach disclosure, and the customer trust loss.
Q&A
14
Is there a free version?
Yes. The Vibe Code Scanner runs surface-coverage vibe pentesting against any deployed URL with no signup. The agent-driven deep pentest is paid. Most teams should start with the free version, fix the obvious findings, then add the deep pentest before handling sensitive data.
Q&A
15
How does vibe pentesting interact with bug bounty programs?
Run the vibe pentest first. The bounty program should not be how you discover that you forgot RLS on the users table — that is what continuous vibe pentesting is for. Bounty researchers earn their payouts on creative findings the agent does not catch. Treat AI pentest as the floor and bounty as the upper bound.
Q&A

RUN A VIBE PENTEST

14-day trial. No card. Full agent-driven scan on your deployed URL in under 60 seconds.

START FREE SCAN