A separate project, Agent Evals, was announced to enable the reliable shipping of agents. This project was born out of internal experience where agents were found to be non-deterministic, creating a strong need for reliability and confidence. Agent Evals provides tooling to benchmark agents by leveraging open standards like OpenTelemetry. It collects real-time metrics and tracing as the agent runs to score performance and inference quality, producing a report that helps users understand their agent’s reliability. This assessment is crucial for determining the level of human intervention required, whether fully autonomous, human-in-the-loop, or human-outer-loop. Agent Evals works in conjunction with other observability tools that support OpenTelemetry standards.
Moving beyond individual developer laptops into full production requires robust security and governance. Solo is addressing this by solving problems such as securing agent communication with LLMs and MCP tools. The Agent Gateway provides a critical solution, offering centralized policy, enforcement, security, and observability for traffic. This includes “context layer enforcement,” which can be configured to put guardrails on responses, for instance, stripping out sensitive data like credit card or bank account numbers as traffic travels through the gateway. Furthermore, Agent Gateway is being integrated into Istio as an experimental data plane option in Istio Ambient mode, helping mediate agent traffic without requiring changes to the agents or MCP tools themselves.
Collectively, these tools—Agent Registry for governance, Agent Evals for reliability, and Agent Gateway for security—are filling in the puzzles needed to run agentic AI in production with confidence. However, for critical work, human involvement remains a necessary component, as the philosophy suggests viewing the agent like a growing co-worker that still benefits from supervision and peer review.
“I’m always thinking about the agent as like a person,” Lin told SD Times. “Even with your coworker, you don’t always trust their work. You need a peer review of the work, to iterate and make it better. So, at this stage of the agent, maybe it’s more like from toddler to kindergarten. It’s growing, right? But even when the agent becomes an adult, like my son just turned 18, you still need to kind of supervise a little bit of providing some insights.”




