AI
Shipping agentic AI that actually saves time
· 1 min read

Every team can stand up an agent demo in an afternoon now. The hard part starts the day after: making the thing reliable enough that a human stops double-checking it.
After deploying a handful of agent and multimodal RAG workflows into real internal operations, the lessons that mattered had little to do with the model and everything to do with engineering discipline around it.
Retrieval is the product
When a RAG system gives a wrong answer, the model is rarely the culprit — retrieval is. Garbage context in, confident nonsense out. Most of the quality wins came from the boring layer:
- Chunking on semantic boundaries, not fixed token counts.
- Storing metadata (source, date, owner) alongside every chunk.
- Re-ranking results before they ever reach the model.
def retrieve(query: str, k: int = 8) -> list[Chunk]:
candidates = vector_store.search(query, k=k * 4)
ranked = reranker.score(query, candidates)
# Keep only high-confidence context; an empty result beats a wrong one.
return [c for c in ranked if c.score > 0.45][:k]
That last line is the whole philosophy: returning nothing is a valid answer. An agent that says "I don't have that" is worth ten that hallucinate.
Give agents a narrow job and a paper trail
Open-ended autonomy is a great way to generate a great-looking incident report. The agents that earned their keep had:
- A single, well-scoped task.
- Tools with hard guardrails, not just prompts asking nicely.
- A full trace of every step, logged like any other production system.
Observability is non-negotiable
You cannot debug what you cannot see. Treat the agent like a distributed system: trace inputs, tool calls, and outputs end to end. When something goes wrong — and it will — the trace is the difference between a five-minute fix and a shrug.
The unglamorous conclusion
The teams winning with agentic AI in 2026 are not the ones with the cleverest prompts. They're the ones treating agents as software: scoped, observable, guard-railed, and held to the same bar as anything else that touches production.
Related reading

AI
An LLM eval harness your team will actually trust
Shipping an AI feature without evals is flying blind — you only learn it regressed when a user does. A small, boring evaluation harness in CI fixes that, and it's less work than the first incident.
· 2 min read

DevOps
Kubernetes migrations without downtime: a project manager's runbook
Most 'big bang' platform migrations don't fail on the technology — they fail on coordination. Here's the runbook I use to move a live system to Kubernetes one slice at a time, with a rollback at every step.
· 2 min read

Compliance
Compliance as Code: turning ISO 27001 controls into CI checks
Audit season shouldn't be archaeology. Here's how I turn a handful of ISO 27001 controls into automated checks that run on every pull request — so evidence is a by-product of shipping, not a fire drill.
· 2 min read