[P] If you're building AI agents, logs aren't enough. You need evidence.
I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback.
Agent demos are easy.
Production agents are where things get ugly:
- an agent calls the wrong tool
- sensitive data gets passed into a model
- a high-risk action gets approved when it shouldn’t
- a customer asks, “what exactly happened in this run?”
- your team needs to replay the chain later and prove it wasn’t tampered with
That's the problem I am trying to solve with the AI Governance SDK.
The SDK is in python and typescript and it gives engineers a programmable way to add:
- audit trails for agent runs and tool calls
- deterministic risk decisions for runtime actions
- compliance proof generation and verification
- replay + drift diagnostics for historical runs
The core idea is simple:
If an agent can reason, call tools, and take actions, you need more than logs. You need a system that can answer:
- what did the agent do?
- why was that action allowed?
- what policy/risk inputs were involved?
- can we replay the run later?
- can we generate evidence for security, compliance, or enterprise review?
What I wanted as an engineer was not another “AI governance dashboard.”
I wanted infrastructure.
Something I could wire into agent loops, tool invocations, and runtime controls the same way I wire in auth, queues, or observability.
If you’re working on agents, copilots, or autonomous workflows, I’d like honest feedback on this:
What would make you fully trust an AI agent in production?
[link] [comments]
Want to read more?
Check out the full article on the original site