Vibe Coding Security Weekly — May 5, 2026: Replit vs Apple Goes Legal, Mythos Finds 271 Firefox Bugs, Gemini CLI CVSS-10 RCE

The week of April 28 to May 5, 2026 turned every previous thread in vibe-coding security into something with teeth. Replit’s CEO escalated the Apple standoff into legal-shaped territory. Anthropic’s Mythos preview produced the first real, dated number for AI-as-defender: 271 vulnerabilities closed in a single Firefox release. Google patched a CVSS-10 RCE in Gemini CLI plus separate Cursor flaws — agentic dev tooling itself is now the supply chain. And the U.S. House opened a national-security probe into Cursor’s parent. Six months ago “vibe coding security” was a category looking for stakes; this week it has them.

TL;DR — The week in one paragraph

Replit, May 2: CEO Amjad Masad told iClarified that Apple’s justification for blocking Replit’s App Store updates — that the app downloads new code post-approval — is a “total lie,” and that he is willing to take it to court. He repositioned Replit’s full-stack architecture (isolated GCP environments, private databases) as a security advantage over peers whose generated apps connect to a public Supabase or Firebase backend.
Anthropic, Apr 29-30: Mozilla shipped Firefox 150 with 271 vulnerabilities patched using early access to Anthropic’s restricted Mythos preview. Separately, an AI-assisted scan disclosed via Dark Reading uncovered a 9-year-old Linux bug allowing unauthorized edits of critical system files.
Google, Apr 30: Google patched CVSS 10 RCE flaws in Gemini CLI that allowed code execution through CI workflows, plus separate flaws in Cursor that enabled code execution. Both are agent-adjacent tooling — the kind of binary developers wire into pipelines without re-reviewing.
U.S. House, Apr 29: Two Republican-led House committees opened probes into Anysphere (Cursor’s parent) and Airbnb over the use of Chinese AI models — Cursor’s Composer 2 is built on Moonshot’s Kimi; Airbnb’s customer agent on Alibaba’s Qwen — citing national-security risk from PRC-linked data sharing.
Lovable, Apr 28 (continued coverage): Lovable’s iOS and Android launch — preview rendered in a webview to satisfy Apple’s no-post-install-code rule — kept rippling across international tech press through early May.

Why is the Replit-Apple fight escalating now?

iClarified reported on May 2 that Replit CEO Amjad Masad is publicly disputing Apple’s reason for blocking Replit’s App Store updates. Masad’s claim: Replit’s mobile app does not download new code post-approval. The dynamic generation happens in Replit’s cloud, and the previewed app is a webview pointing at a hosted URL — the same pattern Lovable used to ship under the new rules (prior coverage).

The substantive part of Masad’s pitch is the security argument, not the legal one. From his longer interview reposted at Info-Today on May 1:

A lot of vibe-coding tools will generate a website and connect it to an external database — great products, but it makes security much harder, because the database is open to the public and you need to configure row-level security, which is especially difficult for non-technical builders. Replit being full stack, with the database built into the project and not open to the public — that makes the app inherently more secure.

That argument has been the structural explanation for most of the Lovable, Bolt, and Base44 incidents catalogued on this site. Apps generated against a public Supabase or Firebase project inherit a permissive default — RLS off, anonymous read, anonymous write — and the user is expected to fix it before launch. They almost never do. (See Is Lovable Safe? and the Lovable BOLA writeup.)

Replit’s pitch is different: the database is private to the deployment, the runtime is an isolated Google Cloud project, and there is no public connection string for an attacker to find. That removes one entire failure class — the one where the secret-laden URL ends up in a screenshot or a .env.local committed to a public repo.

The catch: if Apple wins the post-install-code argument, the question is no longer about Replit’s security model. It is whether any mobile vibe-coding workflow can put generated logic inside the host binary, regardless of where the runtime physically lives. Lovable’s webview workaround already concedes the point. Masad arguing in court would be the first contested test of where the line is.

What did Anthropic’s Mythos actually find?

The headline number from The Neuron’s April 29 digest and a follow-on at desireo.net: 271 latent vulnerabilities were identified and remediated in Firefox 150 using Anthropic’s Mythos preview, a restricted-access model for advanced software analysis. Same week, Dark Reading reported that an AI-assisted scan uncovered a 9-year-old Linux bug allowing unauthorized edits of critical system files.

Both numbers matter for a specific reason. AI-as-defender pitches have been credibility-thin for two years — vendors quoting unverifiable internal benchmarks against synthetic corpora. A patched CVE in shipping Firefox and a documented kernel-adjacent bug in mainline Linux are both publicly auditable outputs. They do not prove generative AI is now better than human auditors at finding bugs; they prove it found bugs that humans missed in shipping software, dated and named.

The asymmetry to watch: the same desireo.net writeup notes that North Korean state actors have been using AI for “vibe coding” of malware and phishing infrastructure, and the operation netted approximately $12 million over three months. Defenders got 271 closed Firefox bugs; offenders got eight figures of revenue. Both are real signals. Neither is yet a trend.

For teams building against agentic models: the practical takeaway is that “AI finds bugs humans miss” is now a defensible claim with examples, but it is also a claim that applies to attackers building tooling against your code. The CLAUDE.md attack-surface theme (prior coverage) gets a sharper edge when both sides have model-driven discovery.

What’s the deal with the Gemini CLI CVSS-10 RCE?

Easy to miss inside a weekly digest, but Hacker News flagged on April 30 that Google had patched CVSS-rated 10.0 RCE flaws in the Gemini CLI that allowed remote code execution through continuous-integration tasks, plus separate Cursor flaws enabling code execution.

Why this is more than a one-off:

1. Agent CLIs are the new privileged binary

A normal IDE plugin runs untrusted user data through a sandbox. An agent CLI runs untrusted user data — including the project’s own files, possibly attacker-controlled — through a model that decides which shell commands to invoke, then invokes them. The attack surface is larger than the IDE plugin generation it replaced, and the deployment surface (CI runners, dev laptops, Docker images) is broad.

2. CI is the worst place for this class of bug

A vulnerability in a CI-installed agent CLI compounds: the runner already has secrets, deploy keys, and write access to the source repo. An RCE there is functionally the same as commit-and-push access. The Gemini CLI patch language matters because “we fixed the RCE” is not the same thing as “your stale runner image is fixed” — pinned versions in CI configs are how that bug ends up live for months after the patch.

3. The pattern was predictable

This rhymes with the Claude Code source-map leak from last week. Agent CLIs are being shipped into security-sensitive contexts faster than the artifacts around them are being treated as security-sensitive. Source maps, debug logs, dev-mode flags, and now CLI code-execution paths are all in the same category: things that used to be harmless when the binary was a code editor and are not harmless now that the binary is an agent.

If you ship CI workflows that install Gemini CLI, Cursor, Claude Code, or any other agent CLI at runtime, the immediate to-dos:

Pin and audit versions. Floating to “latest” inside CI removes your patch latency, but also removes your ability to defer compromised releases.
Treat the agent CLI as a privileged binary. It runs in your runner with your secrets — give it the same scrutiny you give a deploy key.
Allowlist outbound destinations. A compromised agent CLI exfiltrating to attacker infrastructure is the realistic failure mode, not local destruction.

Why is the U.S. House probing Cursor’s parent?

Semafor reported on April 29 that two Republican-led House committees opened probes into Anysphere (Cursor’s parent) and Airbnb over their use of Chinese AI models. Anysphere’s Composer 2 is built on Moonshot’s Kimi; Airbnb’s customer agent runs on Alibaba’s Qwen. The committee framing is national-security risk from PRC-linked data sharing.

The vibe-coding angle: Cursor’s enterprise customer base now overlaps significantly with regulated and federal-adjacent buyers. A House probe is not a regulation, but it is a procurement signal. Enterprise security review will start asking “what model is under the hood?” the same way it once asked “where is the data stored?” — and the answer “an open-weights model trained by a PRC-affiliated lab” is going to be a longer conversation than it was a quarter ago.

For teams choosing tooling: the new security-review checkbox is model provenance. Not just “what does Cursor send to the network,” but “whose weights process the prompt, and where is that operator headquartered.” It will not be a deal-breaker for most buyers. It will be a deal-breaker for some.

The week’s smaller stories

The “9-second wipe” frame. Mickai’s Sentinel writeup on May 3 cataloged five published, dated, named-victim incidents over five months involving AI coding agents wiping production data — including a Tom’s Hardware report (April 26, 2026) about a Claude-powered Cursor agent deleting an entire company database in nine seconds, with backups zapped. The piece’s framing: this is what happens when an autonomous agent runs against a host OS with the user’s privileges and no interceptor between the model and the syscalls. Argument is vendor-coloured but the incident list is real.
Replit Databases ship. Guvi’s writeup on May 2 frames Replit Databases as a database-first vibe-coding workflow: the agent reads and writes a real database from the first line of data code, instead of mocking and retrofitting later. The structural improvement is that the persistence layer is internal to the project and not exposed publicly — same security argument Masad made above, applied at the data layer.
Anthropic /ultrareview. aigurux’s April 2026 roundup lists Anthropic’s /ultrareview for Claude Code as an “automatic code analysis before release that identifies security issues, bugs, and performance problems.” Worth watching for what it actually flags in practice — the gap between “detects vulnerabilities” and “detects vulnerabilities a developer would not have caught” is what makes the difference between a marketing primitive and a usable security gate.
Stripe Link CLI for agents. Stripe shipped Link CLI, letting agents spend money via single-use credentials approved by push notification or Face ID. This is the right shape for the problem: instead of giving the agent a long-lived API key, you give it a one-shot capability for a specific transaction. Worth modelling against token-leak risks — the security model is “reject everything that is not a fresh approval,” not “trust the credential because the agent has it.”
The $14,000 OpenAI bill. QWE’s tutorial on May 2 retells the story of a SaaS founder who built a product entirely with Cursor, shared it publicly, had attackers find his exposed API keys within days, and shut down after a $14k OpenAI bill. Old story, but a useful reminder that the highest-impact vibe-coding security failure most builders will encounter is still “key in client bundle.” Same article also surfaces a head-to-head test where Windsurf was 25 minutes faster than Claude Code on building the same task-management app, but shipped 11 bugs and 4 security issues — including hardcoded API keys in the frontend — versus zero security issues from Claude Code.
CodeBrewTools’ “45% of AI code has critical vulns” claim. Their writeup on May 2 cites approximately 45% of AI-generated code containing critical security vulnerabilities. The number is unsourced and reads as marketing-shaped; treat as directional, not authoritative. The honest counterweight remains the /patterns/ series on this site, where the bug categories are concrete and reproducible.
iOS 26.4.2. Apple’s emergency patch for a flaw that allowed deleted Signal messages to be recovered via push-notification handling. Tangential to vibe coding, central to anyone shipping mobile apps that handle sensitive deletion guarantees — push-notification metadata is now a known retention vector.

Why this week’s stories rhyme

A pattern across all of them: the artifacts surrounding agent tooling are now part of the security boundary, and the platforms that distribute them are noticing.

Apple is enforcing that the binary it reviewed is the binary the user runs. Replit is contesting where that line falls. Lovable is engineering around it.
Anthropic’s Mythos producing 271 Firefox patches is the first credible scaled defender outcome — and the same model class is producing $12M North Korean malware operations. The capability is symmetric.
Google’s Gemini CLI CVSS-10 RCE is the moment agent CLIs join the regular vulnerability-disclosure cadence. It will not be the last.
The House probe into Cursor’s parent is procurement asking “whose model is this?” with a national-security lens. Enterprise buyers will start asking the same question with a softer voice.

The structural shift: a year ago, vibe-coding security was about the apps these tools generate. This week, it is also about the tools themselves — their distribution, their CLIs, their model provenance, their interception between the agent and the operating system. The attack surface widened in every direction at once.

Manual checklist — 10 things to verify yourself

Pin agent CLI versions in CI (Gemini CLI, Cursor, Claude Code, Aider, Cline, Windsurf). Floating to latest removes your ability to defer compromised releases.
Audit CI runner secrets. If a compromised agent CLI ran on the runner, what could it exfiltrate today? Rotate anything that survives that thought experiment.
Check your repo for leaked source maps and dev-mode artifacts. The Claude Code precedent applies broadly — .map, .log, __debug__ are now in scope.
Verify model provenance for any agent in your stack. Ask each vendor what underlying model handles your prompts and where the operator is headquartered. Document for security review.
For Supabase/Firebase-backed apps, confirm RLS is on every table. The QWE writeup repeats the 70% Lovable-without-RLS figure. Run your own check: Supabase RLS Checker.
Replace long-lived API keys with single-use credentials where the agent spends or writes. Stripe Link CLI is one example; the pattern applies broadly.
Set up an interceptor or copy-on-write workspace if you let an agent run shell commands unattended. The five named-victim wipe incidents are the reason — Mickai’s framing is vendor-coloured but the threat is real.
Confirm your mobile vibe-coding strategy survives Apple’s enforcement. If your generated app runs inside the host binary, plan a webview migration before review forces it.
Run /ultrareview or an equivalent gate on any release that includes agent-generated code. Output quality varies; the gate’s value is that it runs at all.
Re-check exposed secrets in client bundles. Hardcoded keys in frontend code remain the most common loss vector — once an attacker has the key, the bill arrives in days, not weeks.

VIBE CODING SECURITY WEEKLY — APR 28-MAY 5, 2026

TEST YOUR APP NOW

TL;DR — The week in one paragraph

Why is the Replit-Apple fight escalating now?

What did Anthropic’s Mythos actually find?

What’s the deal with the Gemini CLI CVSS-10 RCE?

1. Agent CLIs are the new privileged binary

2. CI is the worst place for this class of bug

3. The pattern was predictable

Why is the U.S. House probing Cursor’s parent?

The week’s smaller stories

Why this week’s stories rhyme

Manual checklist — 10 things to verify yourself

STOP GUESSING. SCAN YOUR APP.

TEST YOUR APP NOW

TL;DR — The week in one paragraph

Why is the Replit-Apple fight escalating now?

What did Anthropic’s Mythos actually find?

What’s the deal with the Gemini CLI CVSS-10 RCE?

1. Agent CLIs are the new privileged binary

2. CI is the worst place for this class of bug

3. The pattern was predictable

Why is the U.S. House probing Cursor’s parent?

The week’s smaller stories

Why this week’s stories rhyme

Manual checklist — 10 things to verify yourself

Related coverage

Keep reading

STOP GUESSING. SCAN YOUR APP.

GET THESE WEEKLY