Daily notes on AI, testing, and building software.
Anthropic's Claude Mythos — a model with exceptional computer security capabilities — paired with Project Glasswing creates a new paradigm for AI-assisted penetration testing and security regression testing, raising the…
A new generation of agentic testing platforms — led by QA Wolf's natural-language-to-Playwright approach — is collapsing the gap between describing test intent and having production-grade, version-controlled test code,…
Claude Opus 4.7's new xhigh vision resolution setting means AI models can now detect fine-grained UI regressions — misaligned pixels, truncated text, subtle layout shifts — that previously required expensive dedicated…
Anthropic's newly launched Claude Managed Agents public beta provides a fully managed, sandboxed agent harness — which opens the door to running Claude as a reliable, autonomous test orchestrator inside CI/CD pipelines…
New academic research on LLM-driven test oracle generation reveals that the way you prompt your AI matters as much as which model you use — and that the wrong strategy produces tests that pass but silently fail to catch…
A coding model that outputs over 1,000 tokens per second without sacrificing accuracy opens the door to real-time, in-pipeline AI test generation — closing the feedback loop between code commit and test coverage report…
Claude Opus 4.7 introduces a new "xhigh" effort level that pushes the model to reason longer and more thoroughly on hard problems — a capability that could redefine how AI generates, reviews, and prioritises test cases…
The Ministry of Testing community is actively debating which QA roles remain valuable as AI agents take over test execution and generation — and the answers reveal a fundamental shift in what it means to be a software…
Archon solves a problem that grows more critical as AI enters the testing stack: how do you test the test generator? It's the first open-source tool purpose-built for creating deterministic, reproducible benchmarks for…