THE FIRST 60 SECONDS: TIME-TO-FIRST-CRITICAL ON AI-BUILT APPS
The first proven critical on an AI-built app typically lands inside the first minute. This catalog ranks the detection floors per finding class — secret in bundle, permissive RLS, BOLA — each one reproducible against a live gapbench scenario you can time with curl.
This is a detection-floor catalog. The numbers per finding class below are reproducible wall-clock measurements against live deliberately vulnerable scenarios on the gapbench public benchmark — you can time the curl commands at the top of this page and observe the same floors. The detection floor is the latency lower bound: an attacker tooling up a similar probe is bounded by the same physics.
If your reaction is “that’s fast”, the unsettling implication is that an attacker is not slower.
Catalog scope
| Field | Value |
|---|---|
| Window | Nov 2025 – Apr 2026 |
| Source | Anonymized customer engagements + timed runs against gapbench scenarios |
| Reproducibility anchor | gapbench.vibe-eval.com — every floor below is curl-reproducible |
| Calibration control | ref0 — clean reference; a probe that fires here is killed |
| Equipment | Scans ran from us-east-1 over a one-gigabit connection; latency elsewhere scales roughly linearly on bundle-parse-bound classes |
We do not publish a corpus-wide median or distribution because the underlying engagement set is anonymized and not a uniform random sample. The detection floors per class — which is what builders and tool-comparisons actually need — are reproducible against gapbench in seconds.
By finding class
| Finding class | Detection floor (gapbench) | Why it ranks here |
|---|---|---|
| Secret in static bundle | ~6s | Detectable on first parse of HTML and JS |
| Source map shipped to production | ~8s | Same as above; one extra network request |
| Open Firebase rules | ~12s | One unauthenticated read against the public REST endpoint |
| Permissive Supabase RLS | ~18s | Anon key extraction + PostgREST table enumeration |
| RLS off entirely | ~22s | Slightly slower because the scan has to enumerate the table |
| CORS allow-all on credentialed endpoint | ~31s | Requires a preflight test against an API endpoint |
| BOLA — read | ~64s | Two test users + cross-account fetch |
| BOLA — write (PATCH/PUT) | ~78s | Same as above plus the write-back step |
| Self-editable role | ~91s | Requires successful auth, profile fetch, mutated PATCH, re-fetch |
| Open redirect on auth callback | ~134s | Requires triggering full auth flow |
The five fastest classes — bundle secrets, source maps, open Firebase rules, permissive RLS, RLS off — are the modal first-critical we observe across engagements. They are also the five classes that need zero authenticated probing to detect, which is the structural reason they always go first.
TTFC by detection technique
The same critical can be reached by different detection techniques. The technique determines the latency floor; pick the wrong one and the same finding takes ten times longer.
| Technique | Floor | Used for |
|---|---|---|
| Static parse of HTML / JS | ~4s | Secrets, source maps, inline config |
| Single unauthenticated PostgREST / API call | ~10s | RLS off, open Firebase, public S3 buckets |
| Anon-key extraction + targeted probe | ~15s | Permissive RLS, naked-database backends |
| Header inspection | ~6s | CSP / HSTS missing, CORS misconfig |
| Two-session cross-account probe | ~50s | BOLA on read |
| Two-session probe + write-back | ~70s | BOLA on PATCH/PUT/DELETE |
| Authenticated browser flow | ~120s | Open redirects on auth callback, OAuth flaws |
| Crawl + dynamic introspection | ~180s | GraphQL introspection abuse, Swagger-with-bearer |
The class column in the previous table is the what; this column is the how. A scanner that lacks the two-session capability will report 0% BOLA findings — not because the bugs are not there, but because the technique to detect them is not in the scan.
CWE / OWASP mapping for the fastest-discovered classes
The five fastest classes — bundle secrets, source maps, open Firebase rules, permissive RLS, RLS off — are the modal first-critical we observe across engagements. Every one of them is “authorization or credentials, exposed at the static-parse layer.”
| Class | CWE | OWASP | Floor TTFC |
|---|---|---|---|
| Secret in static bundle | CWE-798 Hard-coded Credentials | A02 / A05 | ~4s |
| Source map shipped to production | CWE-538 Externally-Accessible File | A05 | ~6s |
| Open Firebase rules | CWE-862 Missing Authorization | A01 / API1 | ~10s |
| Permissive Supabase RLS | CWE-863 Incorrect Authorization | A01 / API1 | ~12s |
| RLS off entirely | CWE-862 Missing Authorization | A01 / API1 | ~14s |
| CORS allow-all on credentialed endpoint | CWE-942 Permissive Cross-domain Policy | A05 / API8 | ~20s |
| BOLA — read | CWE-639 Auth Bypass via Key | A01 / API1 | ~50s |
| Mass assignment / self-editable role | CWE-915 Mass Assignment | A04 / API6 | ~60s |
| Open redirect on auth callback | CWE-601 URL Redirect to Untrusted | A01 / API8 | ~120s |
The CWE-862 / CWE-863 split is the dominant pair. The vast majority of fast-discovered criticals are missing or incorrect authorization — the scanner is not finding clever exploits, it is finding doors with no lock.
Per-platform modal first-critical
The class of finding that lands first on a typical app from each platform — driven by the platform’s default scaffolding, not by the scanner.
| Platform | Modal first-critical class | Why |
|---|---|---|
| Bolt.new | Secret in bundle | Frontend-only quick-prototype patterns leak sk_live_ early |
| Lovable | Permissive Supabase RLS | Default Supabase backend; policy creation lags table creation |
| Replit | Open Firebase rules / public .env |
Default deploy patterns expose either Firebase or env |
| Cursor | BOLA on read | Custom API surface, ownership check often missing |
| V0 (with backend) | Self-editable role | Mass-assignment shape in scaffolded forms |
The class column is the answer to “what fails first on each platform”; the detection floor for that class is in the table above. Together they say: on Bolt, the floor is bundle-parse latency; on Cursor, you need to provision two test users before you can prove the modal issue.
A faster modal class is not better — it means the failure is shallower.
What this means
For builders: the failures that are findable in under a minute are the failures that are exploitable in under a minute. Time-to-first-critical is a credible upper bound on how long your app has to be public before an attacker tooling up a similar probe finds the same thing.
For researchers: the detection floors per class are a reproducible baseline. A claim that a new tool finds critical issues “faster” needs to beat these floors against the same gapbench scenarios — not against an unpublished corpus.
For investors and acquirers: when you are doing security due diligence on a vibe-coded SaaS, the time it takes to find the first critical is a useful informal scoring axis. A clean URL ten minutes in is meaningful signal; a critical inside thirty seconds is meaningful signal too.
Methodology
Source. Detection-floor measurements were taken against deliberately vulnerable scenarios on the gapbench public benchmark. Modal-class observations draw on anonymized customer engagements between Nov 2025 and Apr 2026.
Timer. Started on the first outbound request from the scanner against the target URL. Stopped on the moment a critical-severity finding was captured, replayed in sandbox, and confirmed.
Severity. CVSS 3.1 with the published rubric. Critical = 9.0+.
Equipment. Scans ran from us-east-1 over a one-gigabit connection. Latencies elsewhere will be higher; expect TTFC to scale roughly linearly with round-trip time on the bundle-parse-bound classes.
Calibration against ref0. Every probe is also run against ref0, a clean reference site. A probe that fires on ref0 is by construction a false positive and gets killed. The detection floors above are net of false-positive elimination — a probe that incorrectly fires within 5 seconds against a clean target would otherwise be the headline number, and is the primary reason most “fast scanner” claims do not hold up under scrutiny.
Reproduce on the public benchmark
Each detection class above can be reproduced against a live gapbench scenario. The TTFC floor is roughly the same against the gapbench scenarios as it is against real corpus apps — these scenarios are deliberately shaped to mirror the same failure surfaces.
| Class | Scenario | URL |
|---|---|---|
| Secret in static bundle | Indie SaaS | /site/indie-saas/ |
| Permissive RLS | Supabase clone | /site/supabase-clone/ |
| BOLA on read | Multi-tenant SaaS | /site/multi-tenant-saas/ |
| BOLA on PATCH (balance) | Fintech app | /site/fintech-app/ |
| Mass assignment / self-editable role | Mass assignment | /site/mass-assignment/ |
| Open redirect on auth callback | OAuth redirect_uri | /site/ssrf-open-redirect-oauth/ |
| ref0 (clean control) | ref0 | /site/ref0/ |
For the manifesto-level argument behind this style of measurement, see Why we built gapbench and False positives and the ref0 control.
How to reproduce
Run VibeEval against any URL. The scanner displays a live timer and announces each finding as it lands; the first critical timestamp is preserved in the report.
Sources and references
- gapbench scenarios used as detection-floor anchors. indie-saas (secret in bundle), supabase-clone (permissive RLS), multi-tenant-saas (BOLA), fintech-app (BOLA on PATCH), mass-assignment, ssrf-open-redirect-oauth, ref0 (clean control).
- CVSS 3.1 severity rubric. first.org/cvss/v3.1/specification-document.
- OWASP API Security Top 10 (2023) and OWASP Top 10 Web (2021) for category mappings.
- CWE-798, CWE-862, CWE-863, CWE-639, CWE-538, CWE-915, CWE-601 for the per-class taxonomy.
Citations
VibeEval. The First 60 Seconds: Time-to-First-Critical on AI-Built Apps. May 2026. https://vibe-eval.com/data-studies/time-to-first-critical/
Related
- Pattern manifesto: Why we built gapbench
- Pattern walkthrough: False positives and the ref0 control — why the TTFC numbers are not just scanner noise
- Pattern walkthrough: BOLA in AI-generated CRUD — why BOLA detection is structurally slower than RLS
- Data study: 2026 AI App Security Benchmark
- Data study: Where Vibe Coders Leak Their Keys
- Data study: BOLA in AI-Generated CRUD
- Data study: Honeypot Supabase — time-to-abuse on the attacker side
- Guide: Solo Founder Pre-Launch Security Checklist
RUN IT YOURSELF
Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.
time curl -s https://gapbench.vibe-eval.com/site/indie-saas/ | grep -oE 'sk_(live|test)_[A-Za-z0-9]{20,}' | head -1
ANON=$(curl -s https://gapbench.vibe-eval.com/site/supabase-clone/ | grep -oE 'eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+' | head -1) && curl -s "https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/users?select=*&limit=1" -H "apikey: $ANON"
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
time curl -s -I https://gapbench.vibe-eval.com/site/ref0/
COMMON QUESTIONS
SEE YOUR APP'S TIME-TO-FIRST-CRITICAL
Run VibeEval against your URL and watch the timer. Most apps fail before the page is fully loaded.