AIAGENTSRAGLLMMLOPSDEVOPS

Shipping agentic AI that actually saves time

Jun 10, 2026 · 1 min read

Isometric blueprint of a dense glowing node network with bright corner hubs — interconnected AI agents and retrieval.

Every team can stand up an agent demo in an afternoon now. The hard part starts the day after: making the thing reliable enough that a human stops double-checking it.

After deploying a handful of agent and multimodal RAG workflows into real internal operations, the lessons that mattered had little to do with the model and everything to do with engineering discipline around it.

Retrieval is the product

When a RAG system gives a wrong answer, the model is rarely the culprit — retrieval is. Garbage context in, confident nonsense out. Most of the quality wins came from the boring layer:

Chunking on semantic boundaries, not fixed token counts.
Storing metadata (source, date, owner) alongside every chunk.
Re-ranking results before they ever reach the model.

def retrieve(query: str, k: int = 8) -> list[Chunk]:
    candidates = vector_store.search(query, k=k * 4)
    ranked = reranker.score(query, candidates)
    # Keep only high-confidence context; an empty result beats a wrong one.
    return [c for c in ranked if c.score > 0.45][:k]

That last line is the whole philosophy: returning nothing is a valid answer. An agent that says "I don't have that" is worth ten that hallucinate.

Give agents a narrow job and a paper trail

Open-ended autonomy is a great way to generate a great-looking incident report. The agents that earned their keep had:

A single, well-scoped task.
Tools with hard guardrails, not just prompts asking nicely.
A full trace of every step, logged like any other production system.

Observability is non-negotiable

You cannot debug what you cannot see. Treat the agent like a distributed system: trace inputs, tool calls, and outputs end to end. When something goes wrong — and it will — the trace is the difference between a five-minute fix and a shrug.

The unglamorous conclusion

The teams winning with agentic AI in 2026 are not the ones with the cleverest prompts. They're the ones treating agents as software: scoped, observable, guard-railed, and held to the same bar as anything else that touches production.

Isometric blueprint art of glowing test-case nodes radiating from a central hub on a grid — an LLM evaluation set.

AILLMEVALS

An LLM eval harness your team will actually trust

Shipping an AI feature without evals is flying blind — you only learn it regressed when a user does. A small, boring evaluation harness in CI fixes that, and it's less work than the first incident.

Jun 27, 2026 · 2 min read

Isometric blueprint wireframe of a looped pipeline track with connector segments — a staged, reversible migration path.

DevOps

DEVOPSKUBERNETESCLOUD

Kubernetes migrations without downtime: a project manager's runbook

Most 'big bang' platform migrations don't fail on the technology — they fail on coordination. Here's the runbook I use to move a live system to Kubernetes one slice at a time, with a rollback at every step.

Jun 27, 2026 · 2 min read

Isometric blueprint of glowing data pipelines passing through gate-like control stations — ISO 27001 checks in CI.

Compliance

COMPLIANCEISO 27001CI/CD

Compliance as Code: turning ISO 27001 controls into CI checks

Audit season shouldn't be archaeology. Here's how I turn a handful of ISO 27001 controls into automated checks that run on every pull request — so evidence is a by-product of shipping, not a fire drill.

Jun 27, 2026 · 2 min read

Retrieval is the product

Give agents a narrow job and a paper trail

Observability is non-negotiable

The unglamorous conclusion

Related reading

An LLM eval harness your team will actually trust

Kubernetes migrations without downtime: a project manager's runbook

Compliance as Code: turning ISO 27001 controls into CI checks