Why it matters for testing
AI agents are being deployed into production pipelines at scale, but most teams have no systematic way to test their safety and security properties — Microsoft's RAMPART fills that gap with a framework QA engineers already know. This directly changes how QA teams must think about agent validation: safety testing is now a first-class citizen alongside functional testing, written in the same tool, run in the same CI/CD pipeline.
Intro
For most of software testing history, security and safety testing was someone else's job — red teamers, pentesters, or a specialist team that arrived after the product was built. But as AI agents become core parts of production software — reading emails, executing code, browsing the web, calling APIs — that separation is no longer sustainable. Microsoft just made it a lot easier to close that gap.
On May 20, 2026, Microsoft open-sourced RAMPART (Risk Assessment and Measurement Platform for Agentic Red Teaming) alongside a companion design tool called Clarity. If you've ever written a pytest test, you can write a RAMPART test. And that changes everything for QA teams working with agentic systems.
The AI development/news
RAMPART is a pytest-native safety and security testing framework for AI agents, released by Microsoft on May 20, 2026. It's built on top of PyRIT (Microsoft's existing Python Risk Identification Toolkit for generative AI red teaming), but with a key philosophical shift: PyRIT is for security researchers doing black-box discovery after a system is built; RAMPART is for engineers doing safety validation while building.
The framework lets teams write test cases that probe or attack their AI agents — checking for cross-prompt injection vulnerabilities (where untrusted data from an email, file, or web page reaches the AI indirectly), unintended behavioral regressions, data exfiltration risks, and other harm categories. RAMPART orchestrates the agent interaction, evaluates the outcome against your assertions, and reports like any other pytest suite.
The companion tool, Clarity, is a structured design-review document that forces you to articulate your agent's intent, risks, and expected behavior before writing code — essentially a pre-mortem for AI systems.
Both tools are available now as open-source projects on GitHub.
Current testing landscape
Today, testing an AI agent typically means one of three things:
- Ad-hoc prompting — developers manually try inputs that might trip the agent up, with no systematic coverage or repeatability.
- LLM-as-judge eval frameworks — tools like LangSmith or custom harnesses that pipe outputs through a scoring model, useful for quality but not designed around security threat models.
- External red teaming — bringing in a security team or vendor after the agent is mostly built, often too late to change the architecture.
None of these integrate naturally into the CI/CD pipeline as a blocking quality gate. Security and safety remain disconnected from the test suite that runs on every pull request.
The impact
RAMPART fundamentally changes the economics of AI agent safety testing by making it:
Continuous — because it's pytest, it runs in CI. You can gate a deploy on your safety tests passing, the same way you gate on unit tests.
Repeatable — RAMPART uses mocking and controlled interaction so tests aren't non-deterministic. You get stable pass/fail signals you can track over time.
Developer-owned — QA engineers and developers can write safety tests without needing a security specialist. The threat model is embedded in the test itself, not siloed in a separate report.
Coverage-oriented — the framework includes probes for specific harm categories, so teams can track safety coverage the way they track code coverage.
The broader implication: for teams building agentic AI into their products, RAMPART raises the bar for what "done" means. A feature isn't shipped until its safety tests pass.
Practical applications
Here's how QA teams can start using RAMPART today:
Prompt injection regression tests — write tests that embed adversarial instructions in simulated data sources (mock emails, web pages, documents) and assert the agent doesn't comply with them. Run these on every agent code change.
Behavioral boundary tests — define the agent's intended scope and write tests that verify it refuses to act outside that scope. Useful for agents that have access to external APIs or file systems.
Data exfiltration probes — test that the agent doesn't leak sensitive data from its context window when prompted to summarize or reformat content from untrusted sources.
Regression suite for new model versions — when upgrading the underlying LLM (e.g., from Claude Opus 4.7 to 4.8), run your full RAMPART suite against the new model before switching production traffic.
Clarity-first design — before building a new agent capability, draft a Clarity document. Use it to identify which RAMPART test categories apply, then write the tests before writing the feature code. Shift-left safety.
Tools/frameworks to watch
- RAMPART — the framework itself, open source on GitHub
- Clarity — the companion design-review tool
- PyRIT — the underlying red-teaming library RAMPART extends, useful for deeper adversarial research
- pytest — still the backbone; if your team knows it, there's minimal ramp-up to adopting RAMPART
- OpenTelemetry — increasingly used to trace agent execution, pairs well with RAMPART's test assertions for complex multi-step agent behavior
Conclusion
The arrival of RAMPART signals something important: safety testing for AI agents is maturing from a specialist practice into a standard engineering discipline. Just as unit testing went from "something academics do" to "table stakes for any pull request," safety testing for agents is on the same trajectory.
For QA teams, this is both an opportunity and an expectation. The teams that build RAMPART into their CI pipelines now — and develop the fluency to write meaningful safety probes — will be ahead of the organizations scrambling to retrofit safety validation after their agents have already caused an incident.
The framework is open source, built on tools QA engineers already use, and available today. There's no reason to wait.