Blog

Daily notes on AI, testing, and building software.

April 20, 2026AI/LLM Updates
Project Glasswing and Claude Mythos: How AI Is Rewriting Security Testing
Anthropic's Claude Mythos — a model with exceptional computer security capabilities — paired with Project Glasswing creates a new paradigm for AI-assisted penetration testing and security regression testing, raising the…
April 20, 2026Testing Tools
Natural Language to Playwright: QA Wolf and the Agentic Test Generation Wave
A new generation of agentic testing platforms — led by QA Wolf's natural-language-to-Playwright approach — is collapsing the gap between describing test intent and having production-grade, version-controlled test code,…
April 20, 2026AI/LLM Updates
Claude Opus 4.7's 3× Vision Resolution Upgrade Is a Game-Changer for Visual Testing
Claude Opus 4.7's new xhigh vision resolution setting means AI models can now detect fine-grained UI regressions — misaligned pixels, truncated text, subtle layout shifts — that previously required expensive dedicated…
April 20, 2026AI/LLM Updates
Claude Managed Agents Are Here — What It Means for Autonomous Test Pipelines
Anthropic's newly launched Claude Managed Agents public beta provides a fully managed, sandboxed agent harness — which opens the door to running Claude as a reliable, autonomous test orchestrator inside CI/CD pipelines…
April 19, 2026Test Automation
ArXiv Research Drop: How Prompting Strategy Determines Whether Your AI-Generated Tests Actually Catch Bugs
New academic research on LLM-driven test oracle generation reveals that the way you prompt your AI matters as much as which model you use — and that the wrong strategy produces tests that pass but silently fail to catch…
April 19, 2026AI/LLM Updates
GPT-5.3-Codex-Spark Does 1000+ Tokens/sec — Here's What That Speed Means for CI/CD Testing
A coding model that outputs over 1,000 tokens per second without sacrificing accuracy opens the door to real-time, in-pipeline AI test generation — closing the feedback loop between code commit and test coverage report…
April 19, 2026AI/LLM Updates
Claude Opus 4.7's "xhigh" Effort Level — What Deeper Reasoning Means for Test Automation
Claude Opus 4.7 introduces a new "xhigh" effort level that pushes the model to reason longer and more thoroughly on hard problems — a capability that could redefine how AI generates, reviews, and prioritises test cases…
April 19, 2026Test Automation
Autonomous QA Agents Are Here — Which Testing Roles Actually Survive?
The Ministry of Testing community is actively debating which QA roles remain valuable as AI agents take over test execution and generation — and the answers reveal a fundamental shift in what it means to be a software…
April 19, 2026Testing Tools
Archon Is the First Open-Source Framework for Building AI Testing Frameworks — And It's Already Trending
Archon solves a problem that grows more critical as AI enters the testing stack: how do you test the test generator? It's the first open-source tool purpose-built for creating deterministic, reproducible benchmarks for…

Latest from the blog