Agentic AI systems are doing more and more work. Now humans need to figure out how to verify it all | Fortune
From hallucinations to rogue agents, there are some very clear risks that come with using AI.
And yet, most businesses cannot afford to sit out the AI revolution. Managing this thorny reality is a fundamental challenge for business leaders today, and executives at several leading companies came together to share their insights and experience at Fortune Brainstorm Tech in Apsen, Colorado.
At the top of the priority list is accountability. That is, being able to follow—and if necessary re-trace—all the steps that an AI or agentic AI system took in performing a particular task.
“A key thing that we worry about is how do you build a system that is as right as often as you can possibly make it,” said Edwin Olson, the founder and CEO, autonomous driving technology firm May Mobility. “But also, critically, because you know it’s going to eventually make mistakes, how do you create the transparency and introspectability, so you can understand why it made a mistake and then talk to regulators about how you know that you fixed that issue moving forward.”
Caitlin Halferty, the chief data officer at Thomson Reuters, echoed the sentiment, stressing the importance of transparent output from AI: “I do this with my teams, myself, I encourage this with my clients, making sure there’s a way in which you can validate the output of any model that you’re using.”
With a portofoio of AI-enabled services aimed at professionals in fields like legal and tax compliance, Thomson Reuters has had to focus on AI accountability from early on. Transparency is one of four key pillars of what the company calls “fiduciary grade” products, Halferty said, alongside data privacy and security, subject matter experts, and reliable content.
Another important technique cited by several panelists is designing systems that are effectively able to regulate each other. At May Mobility, Olson said that involves installing systems in autonomous cars that are capable of simulating and assessing various scenarios simultaneously and choosing the best option.
But such systems an also be used in corporate settings and day-to-day workflow. Elena Kvochko, the founder and CEO of Trustguard AI, calls it the “LLM as a judge” technique and uses the analogy of a newsroom to explain how it works.
“You have one person or agent whose job is to be the writer, and then the other person or agent whose job is to be the editor—its sole purpose is to find mistakes, or any inaccuracy that the writer could have potentially missed. So basically this is how you you want your LLM systems to also be designed, so that they are self improving.”
But, Kvochko adds, the key is that the verification has to be structured in separate AI systems. “You don’t want AI to grade its own work,” she said.
Having a smart structure for AI verification is going to become increasingly critical as the technology performs more and more tasks, outpacing the ability of humans to verify all the work.
“You end up in this space where you’ve got so much work that’s been done, so much work to audit, that you can’t truly be accountable,” said SentinelOne Chief AI Officer Gregor Stewart.
He pointed to computer coding, which he said is about one year ahead of other industries. Rather than have a human verify ten thousand lines of AI-written code, teams are figuring out ways to have agents emulate some of the processes developed decades ago for humans in safety-critical industries.
“I think we’re going to see a resurgence of a bunch of techniques we developed for safety critical technologies imported into just average practice,” said Stewart.
