Daily notes on AI, testing, and building software.
GPT-5.5 Codex, released on April 23rd 2026, achieved 79.2% accuracy on a curated code-review benchmark — a massive leap from the previous 58.3% — meaning AI-assisted code review is no longer a novelty but a credible…
OpenAI's GPT-5.5, released April 23, 2026, dramatically improves agentic task completion — writing code, operating software, and finishing multi-step workflows with minimal human input. For QA teams, this accelerates…
OpenAI's GPT-5.5, released April 23–24 2026, is the most capable model yet for long-horizon coding tasks, real-world GitHub issue resolution, and multi-step agentic workflows — precisely the domains where automated test…
OpenAI's GPT-5.5 scored 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro — benchmarks that directly simulate the multi-step, tool-using workflows that underpin modern test automation. When an AI model can plan,…
CVE-2026-1949 is a critical stack-based buffer overflow vulnerability (CVSS 9.8) in the Delta Electronics AS320T industrial AC servo drive, disclosed on April 24, 2026. An unauthenticated remote attacker can exploit the…
CVE-2024-27199 is an authentication bypass vulnerability in JetBrains TeamCity On-Premises, stemming from a path traversal flaw (CWE-22) in the server's web component. An unauthenticated remote attacker can exploit this…
Anthropic's Claude Managed Agents — now in public beta — give QA teams a fully managed, production-grade agent harness with built-in sandboxed execution, state management, and memory, eliminating the infrastructure…
Anthropic's Claude Managed Agents — launched in public beta on April 8, 2026 — give QA teams a fully managed infrastructure for running autonomous, multi-step testing agents without managing servers, scaling, or…
The software testing industry is undergoing its most fundamental shift in a decade: script-based automation is giving way to autonomous quality engineering, where AI agents continuously analyse code changes, identify…
CVE-2026-41248 is a critical (CVSS 9.1) authentication bypass vulnerability in the official Clerk JavaScript SDKs — the authentication layer used by millions of Next.js, Nuxt, and Astro applications. A logic flaw in…