THE FIRST 60 SECONDS: TIME-TO-FIRST-CRITICAL ACROSS 1,500 AI APPS

Across 1,514 scans, the median time from URL submission to the first proven critical finding was 27 seconds. Eighty-four percent of apps had a critical found inside the first minute. Here is the distribution, by platform and by finding class.

This is a distribution study. We measured wall-clock time from URL submission to the first proven critical finding across 1,514 AI-built apps. The median is 27 seconds. The 90th percentile is 92 seconds. Beyond two minutes, the long tail begins — apps that needed full crawl, multiple probes, or authenticated state to surface a critical.

If your reaction is “that’s fast”, the unsettling implication is that an attacker is not slower.

Headline numbers

Metric Value
Apps scanned 1,514
Apps where a critical was found 712 (47%)
Median time-to-first-critical (where found) 27 seconds
90th percentile 92 seconds
Fastest observed 4.2 seconds
Slowest within-scan 8 minutes 11 seconds
Window Nov 2025 – Apr 2026

Distribution

Time bucket Apps reaching first critical in this bucket Cumulative share
0-15s 198 28%
15-30s 218 58%
30-60s 184 84%
60-120s 71 94%
120-300s 32 99%
300s+ 9 100%

Eighty-four percent of apps with a critical exposed it inside the first minute. The 0-15 second bucket is dominated by a single failure class — secrets in the static bundle.

By finding class

Finding class Median time-to-detect Why it ranks here
Secret in static bundle 6s Detectable on first parse of HTML and JS
Source map shipped to production 8s Same as above; one extra network request
Open Firebase rules 12s One unauthenticated read against the public REST endpoint
Permissive Supabase RLS 18s Anon key extraction + PostgREST table enumeration
RLS off entirely 22s Slightly slower because the scan has to enumerate the table
CORS allow-all on credentialed endpoint 31s Requires a preflight test against an API endpoint
BOLA — read 64s Two test users + cross-account fetch
BOLA — write (PATCH/PUT) 78s Same as above plus the write-back step
Self-editable role 91s Requires successful auth, profile fetch, mutated PATCH, re-fetch
Open redirect on auth callback 134s Requires triggering full auth flow

The five fastest classes — bundle secrets, source maps, open Firebase rules, permissive RLS, RLS off — together cover 76% of all critical findings. They are also the five classes that need zero authenticated probing to detect.

TTFC by detection technique

The same critical can be reached by different detection techniques. The technique determines the latency floor; pick the wrong one and the same finding takes ten times longer.

Technique Floor Used for
Static parse of HTML / JS ~4s Secrets, source maps, inline config
Single unauthenticated PostgREST / API call ~10s RLS off, open Firebase, public S3 buckets
Anon-key extraction + targeted probe ~15s Permissive RLS, naked-database backends
Header inspection ~6s CSP / HSTS missing, CORS misconfig
Two-session cross-account probe ~50s BOLA on read
Two-session probe + write-back ~70s BOLA on PATCH/PUT/DELETE
Authenticated browser flow ~120s Open redirects on auth callback, OAuth flaws
Crawl + dynamic introspection ~180s GraphQL introspection abuse, Swagger-with-bearer

The class column in the previous table is the what; this column is the how. A scanner that lacks the two-session capability will report 0% BOLA findings — not because the bugs are not there, but because the technique to detect them is not in the scan.

CWE / OWASP mapping for the fastest-discovered classes

The five fastest classes — bundle secrets, source maps, open Firebase rules, permissive RLS, RLS off — together cover 76% of all critical findings in the corpus. Every one of them is “authorization or credentials, exposed at the static-parse layer.”

Class CWE OWASP Floor TTFC
Secret in static bundle CWE-798 Hard-coded Credentials A02 / A05 ~4s
Source map shipped to production CWE-538 Externally-Accessible File A05 ~6s
Open Firebase rules CWE-862 Missing Authorization A01 / API1 ~10s
Permissive Supabase RLS CWE-863 Incorrect Authorization A01 / API1 ~12s
RLS off entirely CWE-862 Missing Authorization A01 / API1 ~14s
CORS allow-all on credentialed endpoint CWE-942 Permissive Cross-domain Policy A05 / API8 ~20s
BOLA — read CWE-639 Auth Bypass via Key A01 / API1 ~50s
Mass assignment / self-editable role CWE-915 Mass Assignment A04 / API6 ~60s
Open redirect on auth callback CWE-601 URL Redirect to Untrusted A01 / API8 ~120s

The CWE-862 / CWE-863 split is the dominant pair. The vast majority of fast-discovered criticals are missing or incorrect authorization — the scanner is not finding clever exploits, it is finding doors with no lock.

Per-platform breakdown

Median time-to-first-critical, on the apps where one was found.

Platform Apps with critical Median TTFC Modal first-critical class
Lovable 355 19s Permissive RLS
Bolt.new 156 12s Secret in bundle
Cursor 101 41s BOLA on read
Replit 88 31s Open Firebase rules
V0 33 47s Self-editable role

Bolt’s 12-second median is the shortest because Bolt-built apps fail fastest on the secret-in-bundle class — a detection that needs only a parsed bundle. Cursor’s 41-second median is the longest because Cursor-built apps tend to fail on logic-layer issues (BOLA, self-editable fields) that require authenticated probing.

Faster median is not better here. It means the failure is shallower.

What this means

For builders: the failures that are findable in under a minute are the failures that are exploitable in under a minute. Time-to-first-critical is a credible upper bound on how long your app has to be public before an attacker tooling up a similar scanner finds the same thing.

For researchers: this is a baseline for any future scanner comparison. A claim that a new tool finds critical issues “faster” than VibeEval needs to beat 27 seconds at the median on a comparable corpus. We will publish the corpus URL list under NDA on request.

For investors and acquirers: when you are doing security due diligence on a vibe-coded SaaS, the time it takes to find the first critical is a useful informal scoring axis. A clean URL ten minutes in is meaningful signal; a critical inside thirty seconds is meaningful signal too.

Methodology

Sample. All 1,514 apps in the corpus.

Timer. Started on the first outbound request from the scanner against the target URL. Stopped on the moment a critical-severity finding was captured, replayed in sandbox, and confirmed.

Severity. CVSS 3.1 with the published rubric. Critical = 9.0+.

Equipment. Scans ran from us-east-1 over a one-gigabit connection. Latencies elsewhere will be higher; we expect time-to-first-critical to scale roughly linearly with round-trip time on the bundle-parse-bound classes.

Statistical handling. Where multiple criticals were found, only the first wins. Apps that timed out without a critical are excluded from this study (counted in the main benchmark).

Calibration against ref0. Every probe is also run against ref0, a clean reference site. A probe that fires on ref0 is by construction a false positive and is excluded from the corpus aggregation. The TTFC numbers above are net of false-positive elimination — a probe that incorrectly fires within 5 seconds against a clean target would otherwise be the headline number, and is the primary reason most “fast scanner” claims do not hold up under scrutiny.

Reproduce on the public benchmark

Each detection class above can be reproduced against a live gapbench scenario. The TTFC floor is roughly the same against the gapbench scenarios as it is against real corpus apps — these scenarios are deliberately shaped to mirror the same failure surfaces.

Class Scenario URL
Secret in static bundle Indie SaaS /site/indie-saas/
Permissive RLS Supabase clone /site/supabase-clone/
BOLA on read Multi-tenant SaaS /site/multi-tenant-saas/
BOLA on PATCH (balance) Fintech app /site/fintech-app/
Mass assignment / self-editable role Mass assignment /site/mass-assignment/
Open redirect on auth callback OAuth redirect_uri /site/ssrf-open-redirect-oauth/
ref0 (clean control) ref0 /site/ref0/

For the manifesto-level argument behind this style of measurement, see Why we built gapbench and False positives and the ref0 control.

How to reproduce

Run VibeEval against any URL. The scanner displays a live timer and announces each finding as it lands; the first critical timestamp is preserved in the report.

Citations

VibeEval. The First 60 Seconds: Time-to-First-Critical Across 1,500 AI Apps. May 2026. https://vibe-eval.com/data-studies/time-to-first-critical/

RUN IT YOURSELF

Each scenario below is live on the public benchmark. The commands are copy-paste ready. Outputs may evolve as we tune the scenarios; the bug stays.

Fastest class — secret in static bundle (median 6s)
time curl -s https://gapbench.vibe-eval.com/site/indie-saas/ | grep -oE 'sk_(live|test)_[A-Za-z0-9]{20,}' | head -1
expected Stripe key returned in well under 1s wall-clock; median TTFC for this class is 6s end-to-end including parse
Permissive RLS — anon-key extraction + PostgREST probe (median 18s)
ANON=$(curl -s https://gapbench.vibe-eval.com/site/supabase-clone/ | grep -oE 'eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+' | head -1) && curl -s "https://gapbench.vibe-eval.com/site/supabase-clone/rest/v1/users?select=*&limit=1" -H "apikey: $ANON"
expected 200 with a row from users — bug confirmed in two requests
BOLA on read — two test users + cross-account fetch (median 64s)
curl -s https://gapbench.vibe-eval.com/site/multi-tenant-saas/api/projects/1 -H 'Authorization: Bearer USER_B_TOKEN'
expected 200 with user A's project — adds setup time for two sessions
Clean control — ref0 produces no critical at any time horizon
time curl -s -I https://gapbench.vibe-eval.com/site/ref0/
expected Scanner runs the full probe set; reports no critical

COMMON QUESTIONS

01
Why does time-to-first-critical matter?
It is a structural measure of how shallow the failures are. A critical that takes 5 seconds to find is a critical that takes 5 seconds to find for an attacker too. Long discovery times mean the failure is in a hard-to-reach corner; short discovery times mean it is on the front page.
Q&A
02
What counts as 'time-to-first-critical' in this study?
Wall-clock time from the scanner's first request against the URL to the moment the first critical-severity finding has been captured, replayed, and proven. It includes network latency, page load, bundle parsing, and the first probe that lands a confirmed critical. We exclude scan-queue time.
Q&A
03
Are some criticals faster to find than others?
Yes — substantially. Token leaks in the bundle are detectable as soon as the bundle is parsed. RLS gaps are detectable as soon as the anon key is extracted and a single PostgREST request is made. BOLA findings require setting up two test users and crossing them, so they take longer.
Q&A
04
What is the floor — how fast can a scanner physically go?
The floor is set by the network round-trip plus bundle parse — currently around 4-6 seconds on a fast connection for any DOM-based finding. Anything faster would have to rely on cached or pre-fetched data. The fastest critical we proved was 4.2 seconds (a Stripe sk_live_ in a 12KB inline script).
Q&A
05
Does this measure the scanner or the apps?
Both, conjointly. A faster scanner finds the same finding faster; a more-broken app gives the same scanner a finding faster. We report both axes because builders care about the latter (how fast does my app fail) and tool-comparisons care about the former (how fast can the tool detect).
Q&A
06
Where can I see TTFC measured against deliberately vulnerable scenarios?
https://gapbench.vibe-eval.com/ runs 97 deliberately vulnerable scenarios. Pick any of indie-saas, supabase-clone, multi-tenant-saas, fintech-app and run a timed scan. The TTFC numbers per class above are reproducible against these targets. ref0 is the clean control — same scan, no critical, used to confirm the scanner isn't just generating noise.
Q&A
07
Why is BOLA slower than RLS even though both are 'authorization' bugs?
RLS detection needs one request: extract the anon key, query a table, see rows come back. BOLA detection needs the scanner to provision two test users, complete signup for each, capture session tokens, then make the cross-account request. The setup cost is the bulk of the latency — once both sessions are warm, the actual probe is sub-second. This is why class-level TTFC is more useful than app-level TTFC for comparing scanners.
Q&A
08
How much of the median TTFC is the scanner versus the network?
For the bundle-parse-bound classes (secret leaks, source maps, open Firebase), network round-trip is the dominant cost — usually 50-70% of total TTFC. For probe-based classes (RLS, CORS, BOLA), scanner logic dominates — the test-user provisioning step alone is ~30s on a fast connection. A scanner running closer to the target, or one that cached anonymous probes, would shift the bundle-bound numbers down meaningfully but not the probe-based ones.
Q&A

SEE YOUR APP'S TIME-TO-FIRST-CRITICAL

Run VibeEval against your URL and watch the timer. Most apps fail before the page is fully loaded.

RUN TIMED SCAN