ZIP-SLIP, UNRESTRICTED UPLOAD, SVG XXE
File upload features attract attacks the way light attracts moths. Zip-slip is path traversal in archive extraction. Unrestricted upload accepts arbitrary content types. SVG XXE turns image upload into XML attack. AI codegen reproduces all three because the safe pattern is longer than the unsafe one.
The scenario referenced below runs on gapbench.vibe-eval.com — a public security benchmark we operate.
File I/O is dangerous, AI codegen makes it more so
The pattern across this whole family: AI generates the happy-path file-handling code. The happy path is unsafe by default in nearly every language and library. The mitigations are specific, varied, and not part of the AI’s natural output.
Six distinct surfaces, all on gapbench, all worth handling separately:
Zip-slip
const zip = new AdmZip(uploadPath)
zip.extractAllTo(extractDir, true)
AdmZip.extractAllTo (and many similar libraries) extract entries by their internal paths. If an entry is named ../../../etc/cron.d/evil, that’s where it gets written. The traversal is in the archive, not in your code, so static scanners often miss it.
The fix: validate every entry’s resolved path stays within the extraction directory before writing.
const safe = path.resolve(extractDir, entry.entryName)
if (!safe.startsWith(extractDir + path.sep)) throw new Error('zip-slip detected')
Or use a library that does this for you (yauzl with a sanitization step, modern unzipper versions). Don’t trust the archive.
Live: https://gapbench.vibe-eval.com/site/zip-slip/.
Unrestricted upload
app.post('/upload', upload.single('file'), (req, res) => {
const dest = `/uploads/${req.file.originalname}`
fs.writeFileSync(dest, req.file.buffer)
res.json({ url: dest })
})
Three problems in one. First, originalname is attacker-controlled — originalname = '../config.json' writes outside the directory. Second, no content-type check — attacker uploads .html, .svg, .php, whatever. Third, the file is served from a path under your domain — anything that lands there runs in your origin’s security context.
The fix is layered. Sanitize the filename (or generate one server-side and ignore the client’s). Allow-list extensions and content types. Store uploads on a separate domain (or a CDN) so attacker-uploaded content can’t run as your origin. Add a Content-Disposition: attachment header where appropriate.
Live: https://gapbench.vibe-eval.com/site/file-upload/.
SVG XXE
You allow image uploads. SVG is an image format. SVG is also XML. If your processing pipeline parses the SVG with a permissive XML parser, the parser will resolve external entity references:
<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<svg xmlns="http://www.w3.org/2000/svg"><text>&xxe;</text></svg>
The XML parser fetches /etc/passwd and inlines its contents. If the server returns the rendered SVG to the user, the file contents are exposed. If the parser supports external DTDs, an attacker can trigger SSRF.
Fix: disable external entities in every XML parser. In ImageMagick, use a policy file to disable URL handlers. In libxml2-based libraries, set the no-network and no-DTD flags. Or — easier — refuse SVG uploads. Most apps don’t actually need to accept SVG.
Live: https://gapbench.vibe-eval.com/site/xxe-svg/.
Download-side traversal
app.get('/download', (req, res) => {
res.sendFile(path.join('/storage', req.query.file))
})
Mirror image of zip-slip. ?file=../../../etc/passwd reads outside /storage. Fix is the same shape: resolve the path and verify it stays within the intended root.
Live: https://gapbench.vibe-eval.com/site/download-traversal/.
PDF HTML injection
PDF generation is increasingly done by rendering HTML to PDF (Puppeteer, wkhtmltopdf, similar). If the HTML is built from user input and the generator’s rendering context has access to local files or internal URLs, the attacker injects HTML that reaches them.
<iframe src="file:///etc/passwd"></iframe>
If the rendering engine respects file:// URLs, the contents end up in the PDF. Same for http://internal-service.svc/ URLs in environments where the renderer has network access.
Fix: run the renderer in a sandbox with no file system access and no internal network access. Treat user-supplied HTML as untrusted input even when the output is “just a PDF.”
Live: https://gapbench.vibe-eval.com/site/pdf-html-injection/.
Markdown HTML injection
const html = marked(userInput)
res.send(html)
If marked is configured to allow raw HTML — which is the default for some configurations — then user input that contains <script> tags renders as JavaScript in the resulting HTML. Same shape as XSS, sourced through the Markdown renderer.
Fix: configure the Markdown renderer to escape HTML, or pipe the output through DOMPurify before serving.
Live: https://gapbench.vibe-eval.com/site/markdown-html-injection/.
A specific incident — chained file upload to RCE
Anonymized. A SaaS that processed user-uploaded design files. Uploads went to /uploads/<uuid> on the server’s local disk, served from the same Express process. The team thought they had locked down content types — multer was configured to accept only image/* MIME types.
Two issues. First, MIME type from the client is the client’s claim, not a fact. An attacker uploaded a PHP file with Content-Type: image/png. Second, the server didn’t run PHP, but it did serve .php files via a misconfigured nginx fallback that proxied to a separate (unrelated) PHP service for an old marketing page. The attacker’s “image” got served by the PHP processor and ran. RCE.
The chain was specific to that team’s nginx config but the lesson is general: file upload is dangerous because the file might run in some context you didn’t think about. The fix was layered:
- Generate filenames server-side; never use the client’s. UUID-based, no extension reflected from the upload.
- Magic-byte content-type check — open the file, read the first few bytes, verify against the claimed type. PHP source files don’t start with PNG magic bytes.
- Serve uploads from a separate domain (
uploads.example.com, notexample.com/uploads/). Different origin means even if something runs, it doesn’t run as your origin. Content-Disposition: attachmentfor any user-uploaded content that doesn’t need inline display.
The detection: we upload synthetic files with mismatched magic bytes / extensions / content types and observe what the server stores and how it serves them. Any case where the server stores under an attacker-influenced path or serves with a content-type that allows execution is a finding.
XXE in detail — the SVG case
XXE deserves more than the one paragraph above because the attack pattern repeats across every XML-accepting surface, and SVG is the one where AI codegen ships the bug most often.
<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<text x="0" y="50">&xxe;</text>
</svg>
When this SVG is parsed by a permissive XML parser, the &xxe; entity reference is resolved by reading /etc/passwd and inlining its contents. The parsed DOM contains the file contents. If the SVG is then rendered (rasterized to PNG, included in a PDF, displayed inline), the content surfaces.
The same technique works against:
- Office documents (DOCX, XLSX) — they’re zip archives of XML.
- SOAP services — XML body parsing.
- RSS / Atom feed parsers — XML body parsing.
- Any custom XML import.
The fix is per-parser:
# Python: lxml safe defaults
from lxml import etree
parser = etree.XMLParser(no_network=True, resolve_entities=False, dtd_validation=False)
tree = etree.parse(path, parser=parser)
# Python: defusedxml is the safer choice
from defusedxml import ElementTree as DET
tree = DET.parse(path) # no XXE possible
// Java: configure DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
For SVG specifically, the simpler answer: don’t accept SVG uploads. Most products don’t actually need SVG; PNG, JPG, WebP cover 99% of use cases.
Wrong fix vs right fix — file uploads
// WRONG: filename from client
const dest = path.join('/uploads', req.file.originalname)
fs.writeFileSync(dest, req.file.buffer)
// WRONG: client-supplied content-type
if (req.file.mimetype.startsWith('image/')) { /* accept */ }
// RIGHT: filename server-side, magic-byte check, separate origin
import { randomUUID } from 'crypto'
import { fileTypeFromBuffer } from 'file-type' // reads magic bytes
const detected = await fileTypeFromBuffer(req.file.buffer)
if (!detected || !ALLOWED_TYPES.includes(detected.mime)) {
return res.status(400).end()
}
const safeName = `${randomUUID()}.${detected.ext}`
await s3.putObject({
Bucket: 'uploads-domain-isolated', // separate from app domain
Key: safeName,
Body: req.file.buffer,
ContentType: detected.mime,
ContentDisposition: 'attachment',
})
Cross-stack notes
- Express + multer: AI codegen frequently uses
multer({ dest: 'uploads/' })with no filter andreq.file.originalnamefor the filename. Both unsafe. - Next.js (App Router) + formidable / busboy: Similar shape. Default options are permissive.
- Python + Flask:
file.save(secure_filename(file.filename))is the safer pattern. AI-generated Flask sometimes usesfile.save(file.filename)directly. - Django:
FileField.upload_tohandles destination; the filename comes fromfile.name. Same client-trust issue. - Rails + ActiveStorage: Modern ActiveStorage handles most of this safely (server-generated keys, content-type sniffing). Older Rails Paperclip-based code has the bugs.
How we detect
Each surface has a corresponding probe:
- Zip-slip: upload a zip with a
..entry, observe whether files appear outside the intended directory. - Unrestricted upload: upload files with various extensions and content types, see what’s accepted, hit the result URL to see what executes.
- SVG XXE: upload an SVG with an XXE payload referencing a known-readable file, observe whether content from that file appears in the rendered output.
- Download traversal: hit the download endpoint with traversal payloads, observe responses.
- PDF / Markdown HTML injection: submit content with HTML injection payloads, observe the rendered output.
CWE / OWASP
- CWE-22 — Improper Limitation of a Pathname to a Restricted Directory (zip-slip, download)
- CWE-434 — Unrestricted Upload of File with Dangerous Type
- CWE-611 — XML External Entity Reference (SVG XXE)
- CWE-79 — Cross-Site Scripting (Markdown HTML, PDF HTML)
- OWASP Top 10 — A01:2021 Broken Access Control, A03:2021 Injection
Reproduce it yourself
- Zip-slip: https://gapbench.vibe-eval.com/site/zip-slip/
- File upload: https://gapbench.vibe-eval.com/site/file-upload/
- SVG XXE: https://gapbench.vibe-eval.com/site/xxe-svg/
- Download traversal: https://gapbench.vibe-eval.com/site/download-traversal/
- PDF HTML injection: https://gapbench.vibe-eval.com/site/pdf-html-injection/
- Markdown HTML injection: https://gapbench.vibe-eval.com/site/markdown-html-injection/
Related reading
- Pattern: LLM-rendered HTML and Markdown
- Pattern: Prototype pollution, DOM clobbering, postMessage
- Tool: vibe-code-scanner
COMMON QUESTIONS
TEST YOUR UPLOAD AND DOWNLOAD PATHS
We probe with malicious archives, dangerous content types, and traversal payloads.