Daily notes on AI, testing, and building software.
Hallucination has been the Achilles' heel of AI-generated test cases — a test that asserts the wrong thing is worse than no test at all. GPT-5.5's 52.5% reduction in hallucinated claims on high-stakes prompts directly…
CVE-2026-0300 is a critical, unauthenticated buffer overflow (CWE-787 out-of-bounds write) in the User-ID™ Authentication Portal (Captive Portal) service of Palo Alto Networks PAN-OS. By sending specially crafted…
Anthropic's Claude Opus 4.7 brings "substantially improved" software engineering performance with better vision and the same price point as 4.6 — and it arrives exactly when QA teams are moving from "AI writes tests" to…
With GPT-5.5 and Claude Opus 4.7 now powering enterprise products at scale, the AI itself has become the system under test — and traditional QA methods were never designed for probabilistic, non-deterministic outputs.…
Progress Software has disclosed two vulnerabilities in MOVEit Automation — CVE-2026-4670 (CVSS 9.8, Critical) and CVE-2026-5174 (CVSS 7.7, High) — that together enable unauthenticated remote attackers to gain full…
Anthropic's Claude Managed Agents, now in public beta, give QA teams a production-grade platform for deploying autonomous agents that can plan, execute, and iterate on test workflows — without manually scripting every…
CVE-2026-31431, publicly nicknamed "Copy Fail", is a high-severity local privilege escalation (LPE) vulnerability in the Linux kernel's userspace cryptographic API subsystem (algifaead), disclosed on April 29, 2026. Any…
OpenAI Codex has evolved from a simple code-completion tool into a full autonomous software engineering agent that can write, run, and iterate on tests without human intervention — fundamentally shifting what "automated…
Shipping a feature powered by an LLM means shipping non-deterministic behavior — and most engineering teams in 2026 are still testing their LLM features less rigorously than their login forms. A structured LLM…
OpenAI's GPT-5.3-Codex — the most capable agentic coding model yet — is accelerating code output by ~25% while shipping with built-in computer use and hosted shell execution, meaning AI is no longer just suggesting code…