Test Automation

Microsoft RAMPART Is the pytest for AI Agent Safety — And QA Teams Need to Know About It

Why it matters for testing

Microsoft just open-sourced RAMPART, a pytest-native safety and security testing framework for AI agents — meaning QA teams can now gate agentic AI on safety the same way they gate regular code on unit tests. This is the missing piece that turns AI agent deployments from "hope it behaves" into "prove it behaves."


Intro

You've adopted an AI coding agent. It writes code, calls APIs, browses the web, and executes tools autonomously. And now comes the uncomfortable question every QA engineer eventually asks: how do I actually test this thing for safety?

For traditional software, pytest handles it. For AI agents, until very recently, the answer was a mix of manual red-teaming, vibes, and prayer.

That just changed. On May 20, 2026, Microsoft's AI Red Team open-sourced RAMPART (Risk Assessment and Measurement Platform for Agentic Red Teaming) alongside a companion tool called Clarity — and together they represent the most developer-friendly approach to AI agent safety testing we've seen yet.


The AI development / news

RAMPART is a pytest-native framework designed to write and run safety and security tests for AI agents. The key word is native: developers write RAMPART tests the same way they write any other pytest test, describe an adversarial scenario (e.g., a prompt injection attack), and the framework runs them automatically on every code change via CI.

Each test connects to the agent through a thin adapter, orchestrates an interaction, and evaluates observable outcomes. The result is a clear pass/fail signal — just like any integration test.

RAMPART is built on top of PyRIT, Microsoft's existing open-source red-teaming library, and has already been battle-tested internally. Microsoft's AI incident response team used it to take a reported real-world vulnerability, generate 100 variants of that vulnerability, and test the potency of each variant.

The companion tool, Clarity, is a structured design-review tool that documents an agent's intent, risks, and expected behavior before code is written — a shift-left approach to agent safety that produces structured documentation reviewable by both humans and automated tools.


Current testing landscape

Right now, most teams testing AI agents rely on:

  • Manual red-teaming: skilled security researchers probing the agent by hand. Expensive, time-consuming, non-repeatable.
  • Behavioral smoke tests: a handful of prompts verified by eyeballing the output. Not scalable.
  • LLM-as-judge evaluations: asking a separate LLM to score outputs. Better, but still probabilistic and hard to gate in CI.
  • PyRIT (Microsoft's prior tool): powerful but requiring more orchestration overhead to wire into a test suite.

None of these integrate cleanly into a standard CI/CD pipeline. None of them give you the green/red signal developers are used to seeing in a test run. RAMPART fixes this.


The impact

RAMPART changes the economics of AI agent safety testing in three ways:

1. Safety tests become first-class citizens. By living in pytest, safety tests sit in the same repo as unit tests, run in the same CI pipeline, and block deployments the same way a failing functional test does. Safety stops being a one-time audit and becomes a continuous gate.

2. Adversarial coverage becomes systematic. Teams can build a library of adversarial scenarios — prompt injections, jailbreak attempts, data exfiltration attacks, benign edge cases that cause harmful outputs — and run them on every PR. RAMPART's seed-driven architecture lets you generate variants of each scenario automatically.

3. Shift-left safety becomes real. Clarity encourages documenting agent intent and risks before building, so potential attack surfaces are identified in design rather than discovered in production.


Practical applications

Here's how a QA team can start using RAMPART today:

Start with your highest-risk actions. Does your agent have tools that write files, call external APIs, or execute shell commands? Write RAMPART tests for those first. A test might simulate a prompt injection that tries to get the agent to exfiltrate environment variables.

Wire it into your PR checks. Add RAMPART to your GitHub Actions or CI pipeline alongside your existing test suite. Block merges when safety tests fail, just as you would for unit tests.

Build a scenario library over time. Every real incident or near-miss is a new test case. RAMPART's architecture makes it easy to capture a reported vulnerability and generate dozens of variants automatically.

Use Clarity in sprint planning. Before building a new agent capability, fill out a Clarity document: what is this agent trying to do, what tools does it have, what are the worst-case misuse scenarios? This becomes the spec for your RAMPART test suite.


Tools / frameworks to watch

  • RAMPART (GitHub) — The framework itself. MIT licensed, pytest-native, built on PyRIT.
  • Clarity (Microsoft) — The design-review companion. Generates structured documentation for agent safety reviews.
  • PyRIT — Microsoft's underlying red-teaming library that RAMPART builds on.
  • Playwright MCP — For agents that interact with web UIs, Playwright's MCP server provides production-grade automation primitives useful in combined functional + safety test suites.

Conclusion

The maturation of AI agents in software development has created a new class of software that existing test tooling wasn't designed for. RAMPART is the first tool that genuinely meets QA engineers where they are — in their test suite, in their CI pipeline, in pytest — and gives them a systematic way to verify that AI agents don't just work, but work safely.

The teams that treat agent safety as a first-class testing concern, starting now, will be far better positioned as agents become more autonomous and more deeply integrated into production systems. RAMPART makes it practical. The only remaining question is whether QA teams treat it as a curiosity or put it in the pipeline.


References

Latest from the blog

See all →