Blog

Insights on AI implementation, performance measurement, and technical case studies

Latest Articles

Agents In Production

Agent Evaluation is a Distributed Systems Problem

The nondeterminism gets most of the attention, but the actual difficulty is shared mutable state, environment isolation, and statistical confidence — the same things that make distributed systems hard to test.

Mar 29, 2026

Computer Use Agents

Evaluating Visual Grounding Models on Accounting Software

Cross Platform Benchmark Study AP Automation

Jan 28, 2026

Computer Use Agents

Milestone 2: Building Intelligence on Top of Automation

How bounded reasoning actually works, why format mismatches killed 70% accuracy, and what HITL approval really means in production.

Jan 22, 2026

Computer Use Agents

Milestone 1: Multi-Modal Perception for Computer-Use Agents

I'm building a computer-use agent against real enterprise UIs. Not an API wrapper—something that has to perceive interfaces, identify real elements, and act in a way a human can inspect and understand.

Jan 13, 2026

Mechanistic Interpretability

Debugging Hallucinations: A Mechanistic Investigation into Model Confidence

An engineering investigation into confidence formation in transformer models

Jan 8, 2026

AI Implementation

Production-Ready Agentic AI: A Pragmatic Guide for Engineering Leaders and Teams

Master agentic AI implementation with proven architectural patterns, benchmarking strategies, and production deployment techniques for software engineers and ML teams.

May 29, 2025

Subscribe to the Newsletter

Get weekly insights on AI implementation, performance measurement, and technical case studies.