ZIP-SLIP, UNRESTRICTED UPLOAD, SVG XXE

File upload features attract attacks the way light attracts moths. Zip-slip is path traversal in archive extraction. Unrestricted upload accepts arbitrary content types. SVG XXE turns image upload into XML attack. AI codegen reproduces all three because the safe pattern is longer than the unsafe one.

The scenario referenced below runs on gapbench.vibe-eval.com — a public security benchmark we operate.

File I/O is dangerous, AI codegen makes it more so

The pattern across this whole family: AI generates the happy-path file-handling code. The happy path is unsafe by default in nearly every language and library. The mitigations are specific, varied, and not part of the AI’s natural output.

Six distinct surfaces, all on gapbench, all worth handling separately:

Zip-slip

const zip = new AdmZip(uploadPath)
zip.extractAllTo(extractDir, true)

AdmZip.extractAllTo (and many similar libraries) extract entries by their internal paths. If an entry is named ../../../etc/cron.d/evil, that’s where it gets written. The traversal is in the archive, not in your code, so static scanners often miss it.

The fix: validate every entry’s resolved path stays within the extraction directory before writing.

const safe = path.resolve(extractDir, entry.entryName)
if (!safe.startsWith(extractDir + path.sep)) throw new Error('zip-slip detected')

Or use a library that does this for you (yauzl with a sanitization step, modern unzipper versions). Don’t trust the archive.

Live: https://gapbench.vibe-eval.com/site/zip-slip/.

Unrestricted upload

app.post('/upload', upload.single('file'), (req, res) => {
  const dest = `/uploads/${req.file.originalname}`
  fs.writeFileSync(dest, req.file.buffer)
  res.json({ url: dest })
})

Three problems in one. First, originalname is attacker-controlled — originalname = '../config.json' writes outside the directory. Second, no content-type check — attacker uploads .html, .svg, .php, whatever. Third, the file is served from a path under your domain — anything that lands there runs in your origin’s security context.

The fix is layered. Sanitize the filename (or generate one server-side and ignore the client’s). Allow-list extensions and content types. Store uploads on a separate domain (or a CDN) so attacker-uploaded content can’t run as your origin. Add a Content-Disposition: attachment header where appropriate.

Live: https://gapbench.vibe-eval.com/site/file-upload/.

SVG XXE

You allow image uploads. SVG is an image format. SVG is also XML. If your processing pipeline parses the SVG with a permissive XML parser, the parser will resolve external entity references:

<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<svg xmlns="http://www.w3.org/2000/svg"><text>&xxe;</text></svg>

The XML parser fetches /etc/passwd and inlines its contents. If the server returns the rendered SVG to the user, the file contents are exposed. If the parser supports external DTDs, an attacker can trigger SSRF.

Fix: disable external entities in every XML parser. In ImageMagick, use a policy file to disable URL handlers. In libxml2-based libraries, set the no-network and no-DTD flags. Or — easier — refuse SVG uploads. Most apps don’t actually need to accept SVG.

Live: https://gapbench.vibe-eval.com/site/xxe-svg/.

Download-side traversal

app.get('/download', (req, res) => {
  res.sendFile(path.join('/storage', req.query.file))
})

Mirror image of zip-slip. ?file=../../../etc/passwd reads outside /storage. Fix is the same shape: resolve the path and verify it stays within the intended root.

Live: https://gapbench.vibe-eval.com/site/download-traversal/.

PDF HTML injection

PDF generation is increasingly done by rendering HTML to PDF (Puppeteer, wkhtmltopdf, similar). If the HTML is built from user input and the generator’s rendering context has access to local files or internal URLs, the attacker injects HTML that reaches them.

<iframe src="file:///etc/passwd"></iframe>

If the rendering engine respects file:// URLs, the contents end up in the PDF. Same for http://internal-service.svc/ URLs in environments where the renderer has network access.

Fix: run the renderer in a sandbox with no file system access and no internal network access. Treat user-supplied HTML as untrusted input even when the output is “just a PDF.”

Live: https://gapbench.vibe-eval.com/site/pdf-html-injection/.

Markdown HTML injection

const html = marked(userInput)
res.send(html)

If marked is configured to allow raw HTML — which is the default for some configurations — then user input that contains <script> tags renders as JavaScript in the resulting HTML. Same shape as XSS, sourced through the Markdown renderer.

Fix: configure the Markdown renderer to escape HTML, or pipe the output through DOMPurify before serving.

Live: https://gapbench.vibe-eval.com/site/markdown-html-injection/.

A specific incident — chained file upload to RCE

Anonymized. A SaaS that processed user-uploaded design files. Uploads went to /uploads/<uuid> on the server’s local disk, served from the same Express process. The team thought they had locked down content types — multer was configured to accept only image/* MIME types.

Two issues. First, MIME type from the client is the client’s claim, not a fact. An attacker uploaded a PHP file with Content-Type: image/png. Second, the server didn’t run PHP, but it did serve .php files via a misconfigured nginx fallback that proxied to a separate (unrelated) PHP service for an old marketing page. The attacker’s “image” got served by the PHP processor and ran. RCE.

The chain was specific to that team’s nginx config but the lesson is general: file upload is dangerous because the file might run in some context you didn’t think about. The fix was layered:

  1. Generate filenames server-side; never use the client’s. UUID-based, no extension reflected from the upload.
  2. Magic-byte content-type check — open the file, read the first few bytes, verify against the claimed type. PHP source files don’t start with PNG magic bytes.
  3. Serve uploads from a separate domain (uploads.example.com, not example.com/uploads/). Different origin means even if something runs, it doesn’t run as your origin.
  4. Content-Disposition: attachment for any user-uploaded content that doesn’t need inline display.

The detection: we upload synthetic files with mismatched magic bytes / extensions / content types and observe what the server stores and how it serves them. Any case where the server stores under an attacker-influenced path or serves with a content-type that allows execution is a finding.

XXE in detail — the SVG case

XXE deserves more than the one paragraph above because the attack pattern repeats across every XML-accepting surface, and SVG is the one where AI codegen ships the bug most often.

<?xml version="1.0"?>
<!DOCTYPE svg [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
  <text x="0" y="50">&xxe;</text>
</svg>

When this SVG is parsed by a permissive XML parser, the &xxe; entity reference is resolved by reading /etc/passwd and inlining its contents. The parsed DOM contains the file contents. If the SVG is then rendered (rasterized to PNG, included in a PDF, displayed inline), the content surfaces.

The same technique works against:

  • Office documents (DOCX, XLSX) — they’re zip archives of XML.
  • SOAP services — XML body parsing.
  • RSS / Atom feed parsers — XML body parsing.
  • Any custom XML import.

The fix is per-parser:

# Python: lxml safe defaults
from lxml import etree
parser = etree.XMLParser(no_network=True, resolve_entities=False, dtd_validation=False)
tree = etree.parse(path, parser=parser)

# Python: defusedxml is the safer choice
from defusedxml import ElementTree as DET
tree = DET.parse(path)  # no XXE possible
// Java: configure DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

For SVG specifically, the simpler answer: don’t accept SVG uploads. Most products don’t actually need SVG; PNG, JPG, WebP cover 99% of use cases.

Wrong fix vs right fix — file uploads

// WRONG: filename from client
const dest = path.join('/uploads', req.file.originalname)
fs.writeFileSync(dest, req.file.buffer)
// WRONG: client-supplied content-type
if (req.file.mimetype.startsWith('image/')) { /* accept */ }
// RIGHT: filename server-side, magic-byte check, separate origin
import { randomUUID } from 'crypto'
import { fileTypeFromBuffer } from 'file-type'  // reads magic bytes

const detected = await fileTypeFromBuffer(req.file.buffer)
if (!detected || !ALLOWED_TYPES.includes(detected.mime)) {
  return res.status(400).end()
}
const safeName = `${randomUUID()}.${detected.ext}`
await s3.putObject({
  Bucket: 'uploads-domain-isolated',  // separate from app domain
  Key: safeName,
  Body: req.file.buffer,
  ContentType: detected.mime,
  ContentDisposition: 'attachment',
})

Cross-stack notes

  • Express + multer: AI codegen frequently uses multer({ dest: 'uploads/' }) with no filter and req.file.originalname for the filename. Both unsafe.
  • Next.js (App Router) + formidable / busboy: Similar shape. Default options are permissive.
  • Python + Flask: file.save(secure_filename(file.filename)) is the safer pattern. AI-generated Flask sometimes uses file.save(file.filename) directly.
  • Django: FileField.upload_to handles destination; the filename comes from file.name. Same client-trust issue.
  • Rails + ActiveStorage: Modern ActiveStorage handles most of this safely (server-generated keys, content-type sniffing). Older Rails Paperclip-based code has the bugs.

How we detect

Each surface has a corresponding probe:

  • Zip-slip: upload a zip with a .. entry, observe whether files appear outside the intended directory.
  • Unrestricted upload: upload files with various extensions and content types, see what’s accepted, hit the result URL to see what executes.
  • SVG XXE: upload an SVG with an XXE payload referencing a known-readable file, observe whether content from that file appears in the rendered output.
  • Download traversal: hit the download endpoint with traversal payloads, observe responses.
  • PDF / Markdown HTML injection: submit content with HTML injection payloads, observe the rendered output.

CWE / OWASP

  • CWE-22 — Improper Limitation of a Pathname to a Restricted Directory (zip-slip, download)
  • CWE-434 — Unrestricted Upload of File with Dangerous Type
  • CWE-611 — XML External Entity Reference (SVG XXE)
  • CWE-79 — Cross-Site Scripting (Markdown HTML, PDF HTML)
  • OWASP Top 10 — A01:2021 Broken Access Control, A03:2021 Injection

Reproduce it yourself

COMMON QUESTIONS

01
What is zip-slip?
Zip-slip is path traversal in archive extraction. An attacker uploads a zip whose entries include filenames like ../../../etc/passwd. If the extraction code doesn't normalize paths, the file gets written outside the intended directory. Same bug applies to tar, rar, 7z, and any archive format with relative paths in entries.
Q&A
02
What is unrestricted file upload?
Your upload endpoint accepts any content type and any extension, stores the file at a predictable path under a directory the web server serves. An attacker uploads malicious.html or malicious.svg or — worse — malicious.php and triggers the server to execute it. Even without execution, attacker-controlled HTML served from your domain bypasses your CORS, exploits your same-origin trust, and runs as if it were your code.
Q&A
03
What is SVG XXE?
SVG is XML. XML parsers can be tricked with external entity references — XXE — that read local files or trigger SSRF. If your image-upload feature accepts SVG and your image processing library invokes a permissive XML parser on it, an attacker uploads an SVG with an XXE payload and reads /etc/passwd or hits internal services. The fix is to disable external entities in your XML parser, or to refuse SVG uploads entirely.
Q&A
04
What about download-side traversal?
Your app has /download?file=invoice-42.pdf. The server reads from a directory. The attacker sends ?file=../../../etc/passwd. If the server doesn't normalize, they read the file. This is path traversal on the download side, mirroring zip-slip on the extract side. The fix is the same: normalize and verify the resolved path stays within the intended directory.
Q&A
05
What is HTML injection in PDFs and Markdown?
Two emerging variants. PDF generators that render HTML server-side (wkhtmltopdf, Puppeteer-based PDF generation) can be tricked with attacker-controlled HTML to read local files or hit internal URLs from the rendering process. Markdown renderers that allow raw HTML let attackers inject <script> tags via Markdown content. Both are common in AI-generated apps that produce reports or documents.
Q&A
06
Where can I see this on a real URL?
https://gapbench.vibe-eval.com/site/zip-slip/, https://gapbench.vibe-eval.com/site/file-upload/, https://gapbench.vibe-eval.com/site/xxe-svg/, https://gapbench.vibe-eval.com/site/download-traversal/, https://gapbench.vibe-eval.com/site/pdf-html-injection/, https://gapbench.vibe-eval.com/site/markdown-html-injection/.
Q&A
07
What CWE does this map to?
CWE-22 (Path Traversal), CWE-434 (Unrestricted Upload of File with Dangerous Type), CWE-611 (XML External Entity Reference), CWE-79 (XSS for the HTML-injection variants). OWASP A01:2021 (Broken Access Control), A03:2021 (Injection).
Q&A

TEST YOUR UPLOAD AND DOWNLOAD PATHS

We probe with malicious archives, dangerous content types, and traversal payloads.

RUN THE SCAN