VIBEEVAL VS DEPTHFIRST
DepthFirst's pitch is precision — fewer false positives via semantic code understanding. VibeEval's pitch is the same precision, plus a public benchmark you can audit.
Where DepthFirst Wins
DepthFirst is a real product with real engineering. Their pitch around semantic understanding of code, multi-layer scanning, and PR-based remediation is more polished than the average AI-flavored scanner. For teams with enterprise budget and a security org that values sales-led onboarding and dedicated CSM relationships, DepthFirst is a credible AI-native option.
What they’ve built that we haven’t (yet):
- Multi-layer semantic scanning with explicit reasoning traces
- PR-based remediation suggestions with surrounding-code context
- Enterprise SSO, SCIM, audit log compliance certifications
- A polished web console that emphasizes triage workflows
If your team needs all of those today, DepthFirst is the safer pick. None of those is cheap to build, and we’re not going to pretend we match them on every dimension.
What’s Hard to Verify
DepthFirst’s precision claim is the part of the pitch that doesn’t quite land for us. The numbers — fewer false positives, higher signal-to-noise — are credible-sounding, but they’re measured against unlabeled real-world code. The methodology is “we scanned a corpus and the customer dismissed fewer findings than they did with the previous scanner.”
That’s a useful number for marketing. It’s not a reproducible accuracy claim, because:
- The customer’s dismissal rate isn’t equivalent to false-positive rate. Some real bugs get dismissed because they’re low-priority. Some false positives get accepted because the customer didn’t notice they were noise.
- The corpus is private. You can’t independently grade DepthFirst’s scans of it.
- There’s no clean control. They can show “this many findings on customer code” but not “this many correct findings on code with known ground truth.”
Without labeled test data, accuracy claims are unauditable. You take them on faith.
This isn’t unique to DepthFirst — every legacy and modern scanner is in roughly the same spot. We’re calling it out because their differentiator is precision, and the absence of an auditable methodology weakens the differentiator specifically.
Where VibeEval Differs: gapbench + ref0 = Public Ground Truth
We operate gapbench.vibe-eval.com — a public security benchmark.
- 104 scenarios, currently
- 97 deliberately vulnerable, 5 clean reference sites, 2 calibration targets
- Every scenario tagged with CWEs and OWASP categories
- Live HTTP surfaces, AI-codegen-shaped stacks (Supabase, Next.js, Vite, Express)
Every VibeEval detection rule is calibrated against the benchmark before shipping. A new rule fires on the corresponding vulnerable scenario; doesn’t fire on ref0 or the topic-specific ref-* clean control. If it does fire on a control, the rule gets fixed or killed.
The methodology lives at /patterns/false-positives-and-the-ref0-control/. The manifesto for the whole approach is at /patterns/why-gapbench/.
The customer-facing implication: you can run any scanner you’re evaluating against the same benchmark and produce reproducible numbers. Per-CWE recall: how many planted CWE-X surfaces did the scanner catch? False-positive surface: how often did it fire on the clean controls? Per-scenario detection map: what does it find, what does it miss?
You can do this for VibeEval. You can do this for DepthFirst. You can do this for Snyk, Semgrep, anyone. The numbers don’t depend on what we tell you.
The Run-Both-Against-Gapbench Challenge
Here’s the most useful thing you can do when evaluating any AI-native scanner, including ours:
- Pick five gapbench scenarios that match your stack (the why-gapbench article has the full list).
- Run VibeEval against them.
- Run the competing scanner against them.
- Run both against
ref0and the matchingref-*clean controls. - Compare findings.
Three honest outcomes:
- The scanners agree on the vulnerable scenarios, disagree on the clean controls. The one with fewer findings on clean controls has lower false-positive rate.
- The scanners disagree on the vulnerable scenarios. Look at which CWEs each one missed — that tells you which categories the scanner is blind to.
- The scanners agree on both. Pick on price, ergonomics, or whichever workflow your team prefers.
We will publish our gapbench numbers. If DepthFirst (or any competitor) publishes theirs, the comparison is complete and customers can decide on observable data.
Feature Comparison
| Feature | DepthFirst | VibeEval |
|---|---|---|
| Static analysis (semantic) | Yes | Yes |
| Dynamic analysis (live app) | Limited | Yes |
| AI-codegen-aware rules | Yes | Yes |
| Public ground-truth benchmark | No | gapbench |
| Clean reference sites for FP cal | No | ref0 + ref-* |
| PR-based remediation | Yes | Limited |
| Self-serve pricing | No | Yes ($19/mo) |
| Enterprise SSO/SCIM | Yes | On request |
| Pricing transparency | Demo-only | Public |
| Setup time | Sales cycle | 60 seconds |
When to Pick DepthFirst
- Enterprise team with established AppSec org
- Budget for sales-led onboarding and demo cycle
- PR-based remediation workflow is critical
- Need certified SSO/SCIM and existing compliance frameworks
- Don’t need to verify accuracy claims independently
When to Pick VibeEval
- AI-codegen-shaped stack (Lovable, Bolt, Cursor, Replit, V0, Next.js + Supabase)
- Want auditable accuracy via a public benchmark
- Self-serve pricing matters
- Indie/team scale, founder-priced
- Want to run both scanners against gapbench and decide on the numbers
How long does migration take?
- 1-2 hours for the scanner setup itself
- Existing exception lists and security policies transfer with manual mapping (we publish a translation guide)
- CLI + CI integration is straightforward; Vibrant if you’re used to a sales-led console
- The biggest cultural shift: VibeEval expects you to engage with the calibration story (run gapbench, see the numbers). DepthFirst doesn’t ask that of you.
Pricing Transparency
VibeEval Pro: $19/month, unlimited projects, static + dynamic scanning, 14-day trial without a credit card.
DepthFirst: pricing not public. Demo-gated. Enterprise contracts the typical shape.
The pricing transparency itself is a small philosophical claim. We tell you what it costs because we tell you what we measure. The two go together.
Related reading
- Pattern: Why we built gapbench
- Pattern: False positives and the ref0 control
- Comparison: Snyk Alternative
- Comparison: Semgrep Alternative
COMMON QUESTIONS
LEAVE DEPTHFIRST FOR VIBEEVAL
14-day trial. No credit card. Migration takes under an hour.