THE FIRST 60 SECONDS: TIME-TO-FIRST-CRITICAL ON AI-BUILT APPS

The first proven critical on an AI-built app typically lands inside the first minute. This catalog ranks the detection floors per finding class — secret in bundle, permissive RLS, BOLA — each one reproducible against a live gapbench scenario you can time with curl.

This is a detection-floor catalog. The numbers per finding class below are reproducible wall-clock measurements against live deliberately vulnerable scenarios on the gapbench public benchmark — you can time the curl commands at the top of this page and observe the same floors. The detection floor is the latency lower bound: an attacker tooling up a similar probe is bounded by the same physics.

If your reaction is “that’s fast”, the unsettling implication is that an attacker is not slower.

Catalog scope

Field Value
Window Nov 2025 – Apr 2026
Source Anonymized customer engagements + timed runs against gapbench scenarios
Reproducibility anchor gapbench.vibe-eval.com — every floor below is curl-reproducible
Calibration control ref0 — clean reference; a probe that fires here is killed
Equipment Scans ran from us-east-1 over a one-gigabit connection; latency elsewhere scales roughly linearly on bundle-parse-bound classes

We do not publish a corpus-wide median or distribution because the underlying engagement set is anonymized and not a uniform random sample. The detection floors per class — which is what builders and tool-comparisons actually need — are reproducible against gapbench in seconds.

By finding class

Finding class Detection floor (gapbench) Why it ranks here
Secret in static bundle ~6s Detectable on first parse of HTML and JS
Source map shipped to production ~8s Same as above; one extra network request
Open Firebase rules ~12s One unauthenticated read against the public REST endpoint
Permissive Supabase RLS ~18s Anon key extraction + PostgREST table enumeration
RLS off entirely ~22s Slightly slower because the scan has to enumerate the table
CORS allow-all on credentialed endpoint ~31s Requires a preflight test against an API endpoint
BOLA — read ~64s Two test users + cross-account fetch
BOLA — write (PATCH/PUT) ~78s Same as above plus the write-back step
Self-editable role ~91s Requires successful auth, profile fetch, mutated PATCH, re-fetch
Open redirect on auth callback ~134s Requires triggering full auth flow

The five fastest classes — bundle secrets, source maps, open Firebase rules, permissive RLS, RLS off — are the modal first-critical we observe across engagements. They are also the five classes that need zero authenticated probing to detect, which is the structural reason they always go first.

TTFC by detection technique

The same critical can be reached by different detection techniques. The technique determines the latency floor; pick the wrong one and the same finding takes ten times longer.

Technique Floor Used for
Static parse of HTML / JS ~4s Secrets, source maps, inline config
Single unauthenticated PostgREST / API call ~10s RLS off, open Firebase, public S3 buckets
Anon-key extraction + targeted probe ~15s Permissive RLS, naked-database backends
Header inspection ~6s CSP / HSTS missing, CORS misconfig
Two-session cross-account probe ~50s BOLA on read
Two-session probe + write-back ~70s BOLA on PATCH/PUT/DELETE
Authenticated browser flow ~120s Open redirects on auth callback, OAuth flaws
Crawl + dynamic introspection ~180s GraphQL introspection abuse, Swagger-with-bearer

The class column in the previous table is the what; this column is the how. A scanner that lacks the two-session capability will report 0% BOLA findings — not because the bugs are not there, but because the technique to detect them is not in the scan.

CWE / OWASP mapping for the fastest-discovered classes

The five fastest classes — bundle secrets, source maps, open Firebase rules, permissive RLS, RLS off — are the modal first-critical we observe across engagements. Every one of them is “authorization or credentials, exposed at the static-parse layer.”

Class CWE OWASP Floor TTFC
Secret in static bundle CWE-798 Hard-coded Credentials A02 / A05 ~4s
Source map shipped to production CWE-538 Externally-Accessible File A05 ~6s
Open Firebase rules CWE-862 Missing Authorization A01 / API1 ~10s
Permissive Supabase RLS CWE-863 Incorrect Authorization A01 / API1 ~12s
RLS off entirely CWE-862 Missing Authorization A01 / API1 ~14s
CORS allow-all on credentialed endpoint CWE-942 Permissive Cross-domain Policy A05 / API8 ~20s
BOLA — read CWE-639 Auth Bypass via Key A01 / API1 ~50s
Mass assignment / self-editable role CWE-915 Mass Assignment A04 / API6 ~60s
Open redirect on auth callback CWE-601 URL Redirect to Untrusted A01 / API8 ~120s

The CWE-862 / CWE-863 split is the dominant pair. The vast majority of fast-discovered criticals are missing or incorrect authorization — the scanner is not finding clever exploits, it is finding doors with no lock.

Per-platform modal first-critical

The class of finding that lands first on a typical app from each platform — driven by the platform’s default scaffolding, not by the scanner.

Platform Modal first-critical class Why
Bolt.new Secret in bundle Frontend-only quick-prototype patterns leak sk_live_ early
Lovable Permissive Supabase RLS Default Supabase backend; policy creation lags table creation
Replit Open Firebase rules / public .env Default deploy patterns expose either Firebase or env
Cursor BOLA on read Custom API surface, ownership check often missing
V0 (with backend) Self-editable role Mass-assignment shape in scaffolded forms

The class column is the answer to “what fails first on each platform”; the detection floor for that class is in the table above. Together they say: on Bolt, the floor is bundle-parse latency; on Cursor, you need to provision two test users before you can prove the modal issue.

A faster modal class is not better — it means the failure is shallower.

What this means

For builders: the failures that are findable in under a minute are the failures that are exploitable in under a minute. Time-to-first-critical is a credible upper bound on how long your app has to be public before an attacker tooling up a similar probe finds the same thing.

For researchers: the detection floors per class are a reproducible baseline. A claim that a new tool finds critical issues “faster” needs to beat these floors against the same gapbench scenarios — not against an unpublished corpus.

For investors and acquirers: when you are doing security due diligence on a vibe-coded SaaS, the time it takes to find the first critical is a useful informal scoring axis. A clean URL ten minutes in is meaningful signal; a critical inside thirty seconds is meaningful signal too.

Methodology

Source. Detection-floor measurements were taken against deliberately vulnerable scenarios on the gapbench public benchmark. Modal-class observations draw on anonymized customer engagements between Nov 2025 and Apr 2026.

Timer. Started on the first outbound request from the scanner against the target URL. Stopped on the moment a critical-severity finding was captured, replayed in sandbox, and confirmed.

Severity. CVSS 3.1 with the published rubric. Critical = 9.0+.

Equipment. Scans ran from us-east-1 over a one-gigabit connection. Latencies elsewhere will be higher; expect TTFC to scale roughly linearly with round-trip time on the bundle-parse-bound classes.

Calibration against ref0. Every probe is also run against ref0, a clean reference site. A probe that fires on ref0 is by construction a false positive and gets killed. The detection floors above are net of false-positive elimination — a probe that incorrectly fires within 5 seconds against a clean target would otherwise be the headline number, and is the primary reason most “fast scanner” claims do not hold up under scrutiny.

Reproduce on the public benchmark

Each detection class above can be reproduced against a live gapbench scenario. The TTFC floor is roughly the same against the gapbench scenarios as it is against real corpus apps — these scenarios are deliberately shaped to mirror the same failure surfaces.

Class Scenario URL
Secret in static bundle Indie SaaS /site/indie-saas/
Permissive RLS Supabase clone /site/supabase-clone/
BOLA on read Multi-tenant SaaS /site/multi-tenant-saas/
BOLA on PATCH (balance) Fintech app /site/fintech-app/
Mass assignment / self-editable role Mass assignment /site/mass-assignment/
Open redirect on auth callback OAuth redirect_uri /site/ssrf-open-redirect-oauth/
ref0 (clean control) ref0 /site/ref0/

For the manifesto-level argument behind this style of measurement, see Why we built gapbench and False positives and the ref0 control.

How to reproduce

Run VibeEval against any URL. The scanner displays a live timer and announces each finding as it lands; the first critical timestamp is preserved in the report.

Sources and references

Citations

VibeEval. The First 60 Seconds: Time-to-First-Critical on AI-Built Apps. May 2026. https://vibe-eval.com/data-studies/time-to-first-critical/

RUN IT YOURSELF

Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.

Fastest class — secret in static bundle (median 6s)
time curl -s https://gapbench.vibe-eval.com/site/indie-saas/ | grep -oE 'sk_(live|test)_[A-Za-z0-9]{20,}' | head -1
expected Stripe key returned in well under 1s wall-clock; median TTFC for this class is 6s end-to-end including parse
Permissive RLS — anon-key extraction + PostgREST probe (median 18s)
ANON=$(curl -s https://gapbench.vibe-eval.com/site/supabase-clone/ | grep -oE 'eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+' | head -1) && curl -s "https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/users?select=*&limit=1" -H "apikey: $ANON"
expected 200 with a row from users — bug confirmed in two requests
BOLA on read — two test users + cross-account fetch (median 64s)
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
expected 200 with user A's project — adds setup time for two sessions
Clean control — ref0 produces no critical at any time horizon
time curl -s -I https://gapbench.vibe-eval.com/site/ref0/
expected Scanner runs the full probe set; reports no critical

COMMON QUESTIONS

01
Why does time-to-first-critical matter?
It is a structural measure of how shallow the failures are. A critical that takes 5 seconds to find is a critical that takes 5 seconds to find for an attacker too. Long discovery times mean the failure is in a hard-to-reach corner; short discovery times mean it is on the front page.
Q&A
02
What counts as 'time-to-first-critical' in this catalog?
Wall-clock time from the scanner's first request against the URL to the moment the first critical-severity finding has been captured, replayed, and proven. It includes network latency, page load, bundle parsing, and the first probe that lands a confirmed critical. We exclude scan-queue time.
Q&A
03
Are some criticals faster to find than others?
Yes — substantially. Token leaks in the bundle are detectable as soon as the bundle is parsed. RLS gaps are detectable as soon as the anon key is extracted and a single PostgREST request is made. BOLA findings require setting up two test users and crossing them, so they take longer.
Q&A
04
What is the floor — how fast can a scanner physically go?
The floor is set by the network round-trip plus bundle parse — currently around 4-6 seconds on a fast connection for any DOM-based finding. Anything faster would have to rely on cached or pre-fetched data. The fastest critical we have reproduced is a sub-5-second Stripe sk_live_ detection in a small inline script on the gapbench indie-saas scenario.
Q&A
05
Does this measure the scanner or the apps?
Both, conjointly. A faster scanner finds the same finding faster; a more-broken app gives the same scanner a finding faster. We report both axes because builders care about the latter (how fast does my app fail) and tool-comparisons care about the former (how fast can the tool detect).
Q&A
06
Where can I see TTFC measured against deliberately vulnerable scenarios?
https://gapbench.vibe-eval.com/ runs 97 deliberately vulnerable scenarios. Pick any of indie-saas, supabase-clone, multi-tenant-saas, fintech-app and run a timed scan. The TTFC numbers per class above are reproducible against these targets. ref0 is the clean control — same scan, no critical, used to confirm the scanner isn't just generating noise.
Q&A
07
Why is BOLA slower than RLS even though both are 'authorization' bugs?
RLS detection needs one request: extract the anon key, query a table, see rows come back. BOLA detection needs the scanner to provision two test users, complete signup for each, capture session tokens, then make the cross-account request. The setup cost is the bulk of the latency — once both sessions are warm, the actual probe is sub-second. This is why class-level TTFC is more useful than app-level TTFC for comparing scanners.
Q&A
08
How much of the median TTFC is the scanner versus the network?
For the bundle-parse-bound classes (secret leaks, source maps, open Firebase), network round-trip is the dominant cost — usually 50-70% of total TTFC. For probe-based classes (RLS, CORS, BOLA), scanner logic dominates — the test-user provisioning step alone is ~30s on a fast connection. A scanner running closer to the target, or one that cached anonymous probes, would shift the bundle-bound numbers down meaningfully but not the probe-based ones.
Q&A

SEE YOUR APP'S TIME-TO-FIRST-CRITICAL

Run VibeEval against your URL and watch the timer. Most apps fail before the page is fully loaded.

RUN TIMED SCAN