AI + a16z

Can AI Agents Finally Fix Customer Support?

Jesse Zhang, Kimberly Tan, and Derrick Harris

Posted December 18, 2024

In this episode of the AI + a16z podcast, Decagon cofounder and CEO Jesse Zhang and a16z partner Kimberly Tan discuss how LLMs are reshaping customer support, the strong market demand for AI agents, and how AI agents give startups a a new pricing model to help disrupt incumbents.

Decagon is a startup supplying businesses with AI agents to assist in customer support. These are neither chatbots nor single API call LLM wrappers, but rather advanced, tunable agents personalized to a company’s specific needs and able to handle complex workflows.

In addition to explaining why he started started Decagon and how it’s architected to handle different LLMs and customer environments, Jesse also touches on the benefits of a per-conversation business model and how AI agents will change the required skill sets of the people in charge of customer support.

For more on how automation is changing business processes, read Kimberly’s post: RIP to RPA: The Rise of Intelligent Automation.

Transcript

Jesse: We’re building AI agents for customer service. When we first got started, it was for us, we wanted to build in something that was, like, very, very relatable for ourselves.

And so, of course, no one needs to kind of be taught, like, what AI agents for customer service can do, right? We’ve all been on the phone on hold with airlines or hotels or whatever. And so that’s kind of where the idea originated, and we just talked to a bunch of customers to see, like, specifically what we should build. I think for us in particular, the thing that stood out is that as we learned more about AI agents, we started to really think a lot about what would the future look like when there are a lot of AI agents. Like, I think everyone believes that there’s gonna be a lot of AI agents that come up.

And so for us, an interesting thing would be what would the humans that work around the AI agents do? Like, what tooling would they have? What sort of control or visibility would they have into the agents that they’re working with or managing? And so that’s really what we built the company around. I think that’s the thing that’s made us special so far is that we have all this tooling around these AI agents for the people that we work with to build them and configure them and just make it not really a black box. So, that’s kind of where we’ve created our brand.

Derrick: What inspired you? Because your last company was a consumer-based video company, correct?

Jesse: Yeah.

Derrick: What was the move to get into enterprise software?

Jesse: Great question. I think in terms of topics, when founders think about topics, it’s generally gonna be pretty topic agnostic because actually when you approach a new space, you’re pretty naive. And so there’s some advantage into having a fresh perspective on things. And so when we were ideating, it was pretty much just like no topics off limits. I think what’s a very common pattern that also includes myself of more quantitative people is after you’ve tried a consumer product, you gravitate a lot more towards enterprise software because the problems are a lot more concrete. You have actual customers with actual needs and budgets and stuff like that that you can optimize for and solve problems. Whereas consumers are also very exciting, but it’s a lot more intuition-based and running experiments. And I think for myself personally, that’s a better fit.

Kimberly: And maybe just to start out, what are the most common categories of support that Decagon deals with today? And talk a little bit more about how you actually are leveraging LLMs to solve that problem and what’s possible now that maybe it wasn’t before.

Jesse: Sure. So if you think about automation before, you would have maybe decision trees. You can do some simple NLP of figuring out which path to go down in the decision tree. But we’ve all used chatbots, that’s a pretty frustrating experience. You usually don’t have a question that can be fully solved by decision tree. And so you end up getting shoveled down a path that’s sort of related to what you’re asking, but not really.

Nowadays you have LLMs. And so the magical part of LLMs, as we’ve all used ChatGPT, is that they’re very flexible, and they can adapt to a lot of different situations, and they just have a baseline intelligence around them. And so when you apply that to support or support inquiries or questions that customers have, you’re able to just be a lot more personalized. So that’s number one, right? The personalization factor goes way up, and that unlocks higher stats across the board. You’re able to resolve more things, people are happier, the customer satisfaction is higher.

And so then the natural next step is like, okay, well, if you have this intelligence, then you should be able to do more of the things that a human can do. And what a human can do is they can pull data for you real time, they can take actions, they can reason through multiple steps. If you’re coming in with a pretty complicated question, that’s like, okay, I wanna do this and that, and maybe the AI is only prepared for the first thing. LLM is smart enough to recognize that there’s two questions here, and first let me resolve the first question, and then I’ll help you with the second one. It was basically impossible before LLMs came up. And so that’s why we’re seeing now today there’s a step function in terms of all the things that technology can do because of LLMs.

Kimberly: How are you defining AI agent in this context? Because people use the term agent quite broadly, and I’m curious in the context of Decagon, like, what does it actually mean?

Jesse: So I’ll say agent is more or less a system of LLMs that are working together, right? So you have one LLM call, you basically send a prompt through, you get a response back. With an agent, you want to be able to chain multiple of those together, and maybe even recursively, where you have one LLM call that maybe decides what to do with the message. And then that kind of leads to other calls that pull in more data and, like, can take actions and iterate on what the user has said and maybe ask follow-up questions, right? So, agent for us, you can kind of think of it almost like it’s a web of LLM calls or API calls or like other logic that all works together to produce a better experience.

Kimberly: On that note, maybe if we talk a little bit more about just the actual agent infrastructure that you’ve built, I think one thing that is really interesting is that there’s a lot of demos out there for AI agents all the time, but very few who I think are truly working in production. It’s very hard to know just from the outside what is real and what’s not. So, in your opinion, what are AI agents today very good at doing, and where does there still need to be technical breakthroughs in order to get them to be robust and reliable?

Jesse: So my take on that is actually slightly different in that the differentiator between whether AI agent is just a demo versus “actually works” is not so much like the tech stack because I think most people are probably gonna be using roughly the same techniques. I think once you’re further along in your company journey, as we’ve been around for over a year, you have created things that are very specific to your use case, but at the end of the day, people have access to the same models, people have access to the same techniques. I think the biggest differentiator for something working or not is actually the shape of the use case. It’s hard to know this when you’re first starting, but looking back, you can reflect.

There’s two properties I would say are very important for something to evolve past the demo. The first is that the use case you’re solving, the ROI, has to be very quantifiable, and that’s super important because if that’s not the case, then it’s very hard to convince people to actually use you and spend money on you. And so in our case, the quantifiable metric is what percentage of support inquiries are you resolving? And because there’s a hard number there, people can justify, like, oh, okay, well, if you’re resolving more, let me map that to what I’m currently spending and the time this currently takes. And so if you have that… And the other metric for us is customer satisfaction. So, because it’s really easy to quantify the ROI, people actually adopt it.

The second piece is that the use case has to be incremental. So, if you basically need an agent to be superhuman level and solve, like, near 100% of the use case off the bat, that’s also very difficult because, as we all know, LLMs are non-deterministic. You have to be able to have, like, some sort of fallback. And luckily, support has this nice property that you can always escalate to an agent. And even if you’re, like, solving half the things, that’s hugely valuable for people. So I think the support use case just has that property that makes it nice for an AI agent.

I think there’s a lot of other fields where people can create an impressive demo, and you don’t even have to squint that hard to see why AI agents would be useful. But yeah, maybe it has to be, like, perfect off the bat. And if that’s the case that no one’s really willing to try it or, like, even use it because the ramifications of it not being perfect are kind of serious. So like security or something, right? Like, people run Sims, and it’s pretty classic idea of like, oh, it’d be cool if LLMs could read this. But it’s hard for me to imagine anyone just being like, okay, AI agent, like, go do that. And I’ll trust you to do it because if it makes, like, one mistake, you’re kind of screwed.

Derrick: How clear is it, I guess, that I’m interacting with an AI agent versus interacting with a human versus, like…? Is there an attempt to make it seem natural? Is it just like, this is actually…it’s pretty clear you’re interacting with an LLM and proceed accordingly.

Jesse: That’s generally up to our customers to decide, actually. And we see a pretty high variance. Like, on one end of the spectrum, you have people that really try to personify their agents. And so there’s like a human avatar, there’s a human name. It’s just responding naturally. On the other end of the spectrum, it’s like it calls itself an AI. It basically makes it really clear. I think different companies we work with have different stances on this. Like, oftentimes, if you’re in a regulated industry, you have to make it clear. I think now what’s really cool is that you’re starting to see a behavior shift in the customers. And because, like a lot of our customers, they get a ton of social media posts about, like, holy crap, this is the first chat experience I’ve ever tried that actually feels real, or, like, this is magical. That’s great for them because now their customers are learning that, like, hey, if it’s AI experience, it could actually be better than a human. And in the past, that was not the case because in the past, probably all of us have been on the phone, and it’s just like, all right, like, AJ, AJ, AJ, right?

Kimberly: You mentioned a couple of times this idea of, like, personalization, both in terms of everyone uses the same technical infrastructure under the hood, but it’s about personalizing for support. Some of your customers want different types of personalization. Can you talk more about that, and, like, what exactly it is that you do do such that you’re able to get the personalization that causes people online to say, like, oh shit, this is the best support experience I’ve ever had.

Jesse: For us, there’s the personalization that comes from molding to a user. So you need to have context of the user itself, right? So that’s like additional context you need. And then, two, you need to have the context of the business logic of our customer. If you combine those two together, you have a pretty good experience. Obviously, it sounds pretty easy. It’s pretty hard to actually get all the context you need. And so that’s mostly what we build is, like, how can you build the right primitives so that when someone deploys us, they can pretty easily decide, like, okay, this is the business logic we want, right? Like,a first, you need to, like, do these four steps. And if the third step fails, you have to go to, like, a fifth step. Like that sort of thing where you want to be able to really easily teach the AI that at the same time, giving it access to, like, okay, here are the account details of the user. If you need to fetch more things, you can hit these APIs. That’s the sort of layer that sits on top of the models that…it’s kind of like an orchestration layer, I guess, that makes the agent real.

Kimberly: It sounds like in that case, and you need a lot of access to business systems, you need a lot of information about the user, you probably need a lot of information about how the customer actually likes to interact with their users as well. And I imagine it’s pretty sensitive data. So can you talk more about, like, what are the things that enterprise customers typically need assurances around when it comes to actually deploying AI agents? And how have you thought about the best way to handle that, knowing that your solution does provide a better experience, but also is new for a lot of people experiencing agents for the first time?

Jesse: Yeah. So, this kind of comes down to guardrails, where…and over time, because we’ve done a lot of these implementations, it has become clear, like, what types of guardrails that people care about. For example, the simplest kind is, there might just be some rules that you always have to follow. You know, if you’re working with a financial services company, you can’t really give financial advice because that’s, you know, regulated. So, you kind of have to tune that into the agent and make sure that it never does that. And so oftentimes what you can do is, you have a supervisor model or some sort of system set up that runs these checks before the results go out.

Another type of guardrail you might have is if someone’s coming in and just, like, trying to mess with you and they see that this is a generative system and they’re trying to get you to be like, okay, what’s my balance? Okay, multiply that by 10, like that sort of thing. You want to be able to check for that as well. So, there’s a lot of types of these that we’ve found over the, I guess, like, months to slash year that we’ve been deploying these. And for each one, you can kind of classify like, okay, this, you need this type of guardrail for it. And then, you know, as you build more and more, the system becomes more and more solidified.

Kimberly: And how unique is each guardrail to each customer or each industry? And how do you think about building that at scale as you bring on more and more customers across a wide variety of use cases?

Jesse: This kind of comes back to our core thesis, which is, in a few years, agents are going to be pervasive. So the thing that really matters is giving people the tools and, like, empowering almost the next generation of jobs, I guess, such as like agent supervisors, giving them the tools to build the agents and also add their own guardrails, because we’re not going to be the ones that defines the guardrails for them. Every customer understands their guardrails the best and their business logic the best. So our job is really to be the best at building the tooling and the sort of infrastructure for them to build agents. And so that’s why we keep talking a lot about, like, hey, your agents shouldn’t be a black box. You should have the control over how to construct these guardrails and construct the rules and construct the logic that you want to do.

And so I think that’s probably the one thing that’s set us apart so far, where we’ve just invested a lot into this tooling, and, like, we’ve come up with a lot of creative ways for people that you probably don’t even have super technical backgrounds. Probably don’t understand, like, the deepest understanding of how AI models work, but you can still, like, download what’s in their brain and what they want AI to do, like, into the agent. So, I think that’s going to become more and more important over the next couple of years. And if people are evaluating tools like this, I think that’s should be one of the top criteria, no matter, like, which type of agent you’re evaluating, because you want to feel like that as time goes on, you have the ability to make it better and better.

Derrick: Are there things that the customers or businesses can do to prepare their systems or their practices for any sort of automation, but probably like the sort of agent in particular in terms of how they design their data systems or how they design the software architecture and business logic to be able to enable this? Because I feel like a lot of AI things we come at it, it’s very new, but then once you get into this existing legacy system, like all things, you’re dealing with a lot of spaghetti and duct tape and that sort of thing.

Jesse: In terms of if someone is building from scratch right now, there are a lot of best practices that will make your life easier, right? So, like, the way you construct your knowledge base, we’ve written about this where there’s some things you can do to make it, like, really easy for AI to ingest it and increase its accuracy. Then part of that comes down to, like, having really modular chunks of your knowledge base rather than, like, just having big articles that have a bunch of answers in them, right? So that’s like one tactical thing that people can do. When you’re setting up your APIs, you can make them agent-friendly and, like, set up the permissions in a way and set up the outputs in a way that, like, it makes it easy for the agent to ingest that and not have to do that much computation afterwards to find the answers. So there’s stuff like that, but I wouldn’t say there’s anything that’s like, you have to do this in order to use agents.

Derrick: That sounds like better documentation. Always a good thing. And then, so like information organization, basically.

Kimberly: It sounds like if you’re trying to teach people to basically be able to prompt your agent to act in a way that has the most fidelity to, like, their customer specifically or their use case specifically, there’s a lot of experimentation or I would say like new ground to be broken on just the UI and UX of how someone does that. It’s so different from traditional software. I’m curious, how have you guys thought about that? Like, what the UI, UX looks like in an agent first world? And then how do you think it actually changes in the next couple of years?

Jesse: Yeah, I mean, I will not claim that we’ve solved this. I think we found, like, maybe a local optimum that works pretty well for our current customers. But this is, like, an ongoing field of research, both for us and a bunch of other people. And the core problem comes down to, similar to what we’ve been saying, right, is you have an agent. How can you, number one, see exactly what it’s doing and how it’s making decisions? And then two, use that to decide what updates to make to it and, like, what the feedback to the AI should be. And so those are where the UI elements actually come together. And especially the second piece, right, it’s like how do you actually build an agent? Our view is that over time, it’ll become more and more natural language-based because that is how agents think or how…that’s basically, like, what LLMs are trained on. And in the limit, right, if you had a fully just, like, super-intelligent agent, it would basically be like a human where you can show it stuff, you can explain it stuff, give it feedback, and it just kind of updates in its mind.

If you just think about, like, having a really competent human on your team, it’s like they arrive, you teach them some stuff, they start doing work, and then you just give it feedback and you can show it new things, you can show it new documentation or show it new, like, charts or whatever. So I think in the limit, it kind of moves towards that where things are a lot more conversational and things are more natural language-based and people aren’t just like using these stop-gaps of building gigantic complex decision trees that sort of capture what you want, but can break apart pretty easily. We had to do that in the past because that’s all we had, right? We didn’t have LLMs, but now as the agents get better and better, the UX and the UI is going to be more conversational.

Kimberly: A year ago, which is about when Decagon…a little over a year ago, which is about when Decagon got started, it was very common for people to say that, you know, a lot of the use cases that are very good and very practical for LLMs were also just going to be what people call like GPT wrappers, meaning companies could just make one API call to a foundation model and solve their support challenge immediately. But clearly, we’re seeing both companies opting to use something like Decagon versus doing that. That hasn’t seemed to be the case thus far. And I was wondering if you could explain why that is. Like, what was it about building this in-house that was actually more complicated than people expected? And like, what did people get wrong about this whole notion?

Jesse: There’s nothing wrong with being a GPT wrapper. You basically say that Macie is an AWS wrapper or stuff like that, right? I guess when people say the term, it’s usually meant in a derogatory way. And I guess my view on that would be, I think if you’re building an agent, by definition, you’re going to be leveraging LLMs as tools, right? So you’re kind of building on top of things, just like you would normally build on top of AWS or GCP or stuff like that. I think if we really run into trouble is where the software that you’re building on top of the LLM is just not thick enough or not complex enough for someone to feel like, okay, there’s actually differentiation here.

But for us, I think, looking back, the thing we’re selling is mostly the software. And we’re basically just like a normal software company. And we’re using LLMs as one of the components and one of the tools of the software. But when people pay for a product like this, they mostly want the software, right? They want to be able to have the tools to monitor and just, like, reporting on the AI. They want to be able to deep dive into every conversation the AI is having. And they want to be able to give a feedback and build it and stuff like that, right? So that’s where a lot of the software comes from.

And even with the agent itself, what people run into is it’s pretty cool to make a demo. But if you’re trying to make this production ready and actually customer-facing, you have to squash the super long tail of, like, yeah, protecting against hallucinations, protecting against bad actors that come in, like trying to mess with you. You’re really nailing the latency and, like, the tone and stuff like that, right? And so we’ve talked to many teams where they’ve kind of done some experiments themselves and, like, built the initial version. And then they’re like, okay, yeah, it’s pretty clear. Like, we don’t want to be the ones that build this long tail. And we also don’t want to be the ones that are constantly building in new logic for the CX team, like the customer team. And so it’s like, okay, it kind of makes sense to go with someone.

Kimberly: You mentioned a little bit there’s just like long tail different things, you have to quash bad actors, etc. I’m sure a lot of folks who are listening who think about using AI agents are sort of worried about, you know, when you start introducing LLMs into the picture, there are new vectors for security attacks, or when you introduce agents into the picture, there may be new security risks. How do you guys both think about that as well as think about just like in general best practices when it comes to dealing with agents and ensuring that you still have top-tier enterprise security?

Jesse: There’s some obvious things you can do on the security side. And so those are some of the things I mentioned, right? Like you just want protections in place. At the core, like, what you can do… Like, what people are scared about around LLMs is that they’re not deterministic. But the nice thing is that you can actually put most of the determinists…like most of the sort of sensitive and complex stuff behind a deterministic wall where, like, when it calls out to an API, that’s where the computation happens. And so you’re not really leaving that to the LLM, and that basically squashes a lot of the core issues. But then you still have situations where like, yeah, you have bad actors that come in or, like, people are trying to get into hallucinating and things like that. And so what we’ve seen is that in all the big customers we work with, their security teams will basically come in and, like, red team our products essentially, where they just spend several weeks just, like, hammering it with all the different things that they can think of to try to break it.

And we’re probably gonna see that more and more as AI agents become more pervasive, because that’s one of, like, the best ways to actually gain confidence in does this work or not. It’s like you just red team it and, like, throw a ton of stuff at it. And that’s why, like, yeah, I know there’s services now that are like…there’s startups trying to build red teaming tools or, like, ability for people to do this themselves. But I think that’s a cool thing that we’ve seen so far. And so a lot of the companies we work with, like, during the, probably the late stage of the sales cycle, they just have their own security team or they contract with some external team, and they’re just power testing it. And for us to partner, we have to do good on that. So that’s what it comes down to.

Derrick: Is that something you encourage from your customers? Because I know, like, when we talk about AI policy, one of the big things we talk about is the application layer and, you know, putting onus on the user of the LLM and the person running the application, as opposed to the model itself being this dangerous thing. It’s like, yes, red team and figure out what use cases and what attacks are and what vulnerabilities you have specifically protect against versus, like, just relying on whatever open AI or whoever put in place.

Jesse: For sure. I also think that there will probably be new certifications that come up, because, like, you know, how everyone’s like SOC 2 and HIPAA and stuff like that for different industries. And then most of the time when you sell normal SaaS, like, people will ask for pentests. Like, we always have to provide our pentests. That’s going to be something similar for AI agents, where there’s probably some new thing that someone will coin a name for it. But it’s like a test for, is the agent, like, robust?

Kimberly: One thing that is interesting is people are very excited, obviously, about all the new model breakthroughs and tech breakthroughs coming out of all the large labs. And as an applied AI company, you’re not doing the research yourself, obviously, like you’re leveraging the research and building a lot of software around it to deliver it to an end customer. But you are building on top of very quickly shifting sands underneath. And I’m curious, like, as an applied AI company, how do you manage both being able to predict your own product roadmap and build for what users want while also staying abreast of what all the new tech changes are and how it affects your company? And just more broadly, what do you think is the right strategy for an applied AI company that might be facing similar situations?

Jesse: Well, you have different parts of the stack, right? So you have the LLMs, which are kind of… If you just think about the application layer, the LLMs are at the bottom. You might have, like, tooling in between that helps you manage LLMs or do your evals or whatever. And then the thing at the top is mostly what we build, which is…again, it’s kind of like just standard SaaS. So most of the work we do is actually not too different from normal software, except we obviously have this extra research component of LLMs are changing so fast. What can we use them for? What are they good at? Which model should we use for this task? That’s a big one where you have OpenAI pushing new things, Anthropic pushing new things, Gemini is getting better now. So you have to have your own evals for setting up what people are good at so you can use the right model in the right situation. Sometimes you want to fine-tune and then it’s a question of, like, when do you fine-tune? When is it worth it?

So those are probably the set of researchy questions that are mostly related to the LLMs that you can do. But at least so far, it hasn’t felt like the sands are shifting that quickly since we’re not that reliant on the middle layer right now. So it’s mostly the LLMs are changing. They’re not changing that frequently. And even when they do change, it’s mostly an upgrade. So 3.5 Sonic had an update a couple of months ago at this point. And it’s like, okay, well, should we just swap it out and use that one instead of the old one? Okay, you just run a bunch of evals. And when you do swap it out, you just stop thinking about it because now you’re on the new model. And 01 came out, and it’s a similar situation. Like, what do you use it for? In our case, right, it’s a little bit slow for most of our customer-facing use cases. So we can use it on some more back-end things. But that’s more or less what it comes down to for us. We just have to have good systems in place to do the research around the models.

Kimberly: How often are you evaluating new models and swapping them out?

Jesse: We’ll evaluate them pretty much any time anyone comes out. You just have to be sure that even if it’s a more intelligent model, it doesn’t somehow break some things that your use case is built around. And that can happen. Like, the model can overall be more intelligent, but maybe in some edge case, it’s, like, bad at choosing A or B in one of your workflows. So that’s what the evals are for. I think overall, the type of intelligence that we care a lot about for us is I would describe it more as an instruction following, where we want the models to get better and better at instruction following. And if that’s the case, it just strictly benefits us. So that’s great. It seems like a lot of the research recently has been around more of like reasoning-type intelligence, getting better at coding, getting better at math, stuff like that. That’s helpful for us, too, but it’s not as helpful as the first type.

Kimberly: And one really interesting thing that you brought up a couple of times that I also think is pretty unique to Decagon is you’ve built a lot of eval infrastructure internally to make sure that you know exactly how each model performs against the set of tests that you provide it. Can you talk more about that? Like, how core is that internal eval infrastructure, and how exactly does it give both you and your customers confidence? Because some of it is also customer-facing that the agents are performing the way you would like.

Jesse: I think it’s very important because otherwise it’s very difficult for us to iterate quickly, because if you feel like every change you’re going to make has a big chance of ruining something, then you’re just not going to make changes that quickly. But if you have eval setup, then, all right, we have this big change, we have this model change, we have this new thing that’s been created, let’s just run it against all the evals. And if they’re good, then you can feel like, okay, we improve things, or we can ship this without being too concerned. So in our space, the interesting thing is that the evals need input from the customers because the customers…like, our customers are the ones that decide if something is correct or not. And there are obviously high-level things we can check for. But oftentimes it’s them coming with specific use case, and this is the right answer, or it has to do this. It has to have this tone. It has to say this. And that’s where the eval is based on. And so we have to make sure we have the robust system.

We just started building this ourselves at the start. It hasn’t really been that hard to maintain. And so we know that there are eval companies out there, and we’ve kind of explored a few of them. And maybe at some point we will see if it makes sense to adopt them. But the eval system isn’t like a huge pain point for us anymore.

Kimberly: You know, one popular topic today is multimodality and the idea that AI agents should be able to interact across all forms that humans do today, whether it’s text, video, speech, etc. And I know that Decagon primarily started out as being text-based. So, I’m curious from your perspective, like, how important is multimodality for AI agents, and what’s the time horizon on which you think it becomes fully mainstream or even expected?

Jesse: It’s important in the sense that if you’re thinking about it from a company perspective, it’s not that much harder to add a new modality. I mean, it’s not trivial, but at the core, like, if you solve for the other things, like all the things I mentioned, right, like the tooling to actually build AI and monitor it and have the logic there, then adding a new modality isn’t the hardest thing. So it makes a lot of sense for us to have all the modalities and expands our market. We’re basically modality agnostic. We have our own agents for every single modality. And the limiting factor in general is one, are customers ready to adopt a new modality. I think starting with text makes a lot of sense because that’s what people are more aggressively adopting, and it’s just lower risk for them. And it’s easier for them to monitor, easier for them to rationalize. The other big one is voice. Obviously, I think there’s still room to grow in the market for people to be more comfortable with voice. I think now we’re seeing early movers actually adopt voice agents, which is exciting.

And then the other piece is obviously on the tech side. So I think most people would agree that the bar is just higher for voice, right? If you’re on a phone call with someone, you need the latency to be super crisp. If you interrupt them, they have to respond really naturally. Because latency is lower, you have to be more clever about the way you’re doing computation. If you’re on a chat and it takes, you know, five, eight seconds to respond, you barely even notice it. It feels very natural. If it takes five to eight seconds before replying to you on a phone call, then that feels a bit odd, right? So there’s more like technical challenges, I would say, in voice. So as those technical challenges get solved and the market becomes more sort of interested in adopting voice, that’s what’s going to unlock a new modality like that.

Kimberly: Before we move on, because I want to talk a little bit more about just like what the business model of AI agents looks like. Are there any last things that took you by surprise when you were either building AI agents for the first time or when you were chatting with customers about either systems they were using, data they were handling, concerns that they had? And what are just, like, any non-intuitive or surprising things that Decagon had to do in order to be able to best serve enterprise customers?

Jesse: I think the big surprising thing was when we were first starting, how willing people were to chat with us because we’re just two people. I mean, we both had started companies before, and so, like, we had known a lot more people, but still, it’s everyone, people who have started companies, it’s, like, very relatable, right? You’re trying to get intro conversations. And if what you’re talking about is not that interesting to people, it’s just a pretty lukewarm conversation. When we started talking about this use case, it was…I would say, it’s, like, pretty surprising how excited people were to talk about it because it’s such an obvious idea. And you would think that, okay, because it’s obvious idea, there’s people doing it or there’s solutions or, like, people would have thought of, like, some solution already. But I think the timing was good. It was just like a big use case, right? Like, people really care about this.

And for the reasons I mentioned before, the use case is very well suited for adopting AI agents and, like, pushing them into production because you can do it incrementally. You can track the ROI. I think that was pleasantly surprising. But obviously, that’s…I mean, there’s still a lot of stuff to do after that. Like you have to work with the customers, you have to build the product, you have to figure out what direction to take. But I think in the early days, that was a bit surprising.

Derrick: Kimberly, I mean, I might be remiss not to mention that you wrote this RIP to RPA blog post, which gets into, like, a lot of automation type tasks and startups. Is that something you see across some of these automation tasks or just things that, like, the solutions have not been great, so people are always on the lookout for a better way to do it?

Kimberly: Yeah, I definitely think so. I would say a couple of things about this. The first is that if an idea seems obvious to people and there’s no clear company who’s solving it that everyone points to and says, oh, you should just use that, then that means that the problem actually hasn’t been solved. And it is in some sense like a wide-open opportunity for companies to go build it. Because, you know, we’ve been investors with Decagon since the beginning. We saw them go through the idea maze. And when they landed on support and started chatting with customers, it was very clear that all the customers were desperate to have some sort of AI native support solution. And it was very common. This is the question I asked a little bit before about like it was very common for people to believe that this was just going to be a GPT wrapper. And the level of interest that Decagon got from customers in the very early days led us to believe quite early on that a lot of these problems are just a lot more complicated than people expect. So I think we do see this across industries, whether it is customer service, whether it is maybe more niche automations in specific vertical markets.

I think one thing that is underrated is sort of what Jesse said earlier, knowing that there’s clear ROI for the automation task that you’re doing. Because if you’re going to ask somebody to adopt an AI agent, they are in some sense taking a leap of faith because this is a very unfamiliar territory for a lot of people. And it’s much easier to get an AI agent adopted if you are automating a very specific flow that is either clearly revenue generating or was a bottleneck in the business before to get new demand. Or it was, like, a major cost center that scaled linearly with customer growth or revenue growth or something like that. And to be able to take a problem like that and actually make it much more productized such that it can scale in the way traditional software scales, I think,a is very compelling. Maybe one last question on this topic before we move on is, you know, I remember one thing, Jesse, when you and I were talking in the past was we always thought that when enterprises adopted software or adopted AI agents, hallucinations would be the biggest challenge that they faced, or hallucinations would be the biggest thing they were worried about. I remember one thing that you told me was that actually tends to not be the case. I’m curious if you could elaborate on that and what it is about hallucinations that is either misunderstood in the public and what it is that people actually care a little bit more about?

Jesse: I think people do care about hallucinations, but they care a lot more about the value that can be provided. And so pretty much every enterprise we work with cares about the same things, like literally the same things. It’s what percentage of conversations can you resolve? How happy are my customers? And then hallucinations might kind of be lumped into the third category, which is, like, what’s the accuracy? Generally, when you’re evaluated, the first two matter. And let’s say, hypothetically, you are talking to a new enterprise, and you just, like, completely knock it out of the park on the first two. There’s going to be so much buy-in from the leadership and from just everyone in the company that, like, holy crap, this will not only transform our customer base, it’s like the customer experience is different. Every customer now has their own personal concierge in their pocket. They can ping us any time. We’re giving them good answers. They’re actually happy any language 24/7. So that’s like one piece, and you’re saving a ton of money. So there’s a ton of buy-in, and there’s a lot of tailwinds into getting something done.

Hallucinations obviously has to be solved, but it’s not really, like, the top thing on their mind, right? So the way you kind of address hallucinations is the things I mentioned before. Like, people will test you. There will be probably a sort of proof-of-concept period where you’re actually running real conversations, and they have agents on their team monitoring stuff and checking for accuracy. And if that’s good, then generally you’re in the clear. And as I mentioned before, there’s a bunch of hard protections you can put against the sensitive stuff. Like, you don’t have to make the sensitive stuff generative. So it’s a talking point for most deals where it’s not an important topic, and you’ll go through that process. But it’s never really the focus for any of the conversations.

Kimberly: And now switching over to the business model of AI agents. One big topic of conversation today, as I’m sure you know, is how to actually price them. Historically, a lot of SaaS software were sold per seat since you were selling workflow software specifically for individual workers to increase their productivity. But AI agents are not tied to individual worker productivity. So a lot of people think, probably rightfully so, that seat-based doesn’t actually make as much sense going forward. I’m curious how you thought about that dilemma in the early days and how you guys decided to price Decagon. And then also where you think the future of software pricing is headed more broadly as AI agents become more commonplace.

Jesse: Our view on this is that in the past, software is based per seat because it’s roughly scaled based on the number of people that can take advantage of the software. With most AI agents, the value that you’re providing doesn’t really scale in terms of the number of people that are maintaining it. It’s just like the amount of work output. And this kind of goes in line with what I was saying before, where if the ROI is very measurable, then it’s very clear what level of work output you’re seeing. Our view on this is, okay, per seat definitely doesn’t make sense. You’re probably going to be pricing based on the work output, right? So it’s kind of like the pricing that you want to provide has to be a model where the more work you do, the more that gets paid. So for us, there’s two obvious ways to do that. There’s like you can pay per conversation or you can pay per resolution, like conversation that the AI actually resolves.

I think one fun learning for us has been that most people have opted into the per-conversation model. The reason is that, per resolution, the main benefit is you’re paying for what the AI is doing. But then the immediate thing that happens next is, what is a resolution? And first of all, no one wants to get into that because then it’s like, okay, well, if someone came in and they’re, like, really upset and you, like, sent them away, why are we paying you for that, right? So that’s a weird situation. And then it kind of…it makes the incentives a bit odd for the AI vendor because then it’s like, okay, well, we get paid per resolution. So why don’t we just try to resolve as many as possible, like just deflect people away when there’s a lot of cases where it’s kind of a toss-up and the better experience would have been to escalate, and, like, customers don’t like that, right? So it just creates a lot more simplicity and predictability on the per conversation model.

Kimberly: And how persistent would you say you believe the pricing to go going forward? Because, you know, right now you’re getting comped to, when you say ROI, often it’s like ROI on probably some kind of labor spend that was historically used or something like that. As agents get more and more common, do you think you’ll be compared to labor long-term and that that’s the appropriate benchmark or not? And if not, how do you think about long-term pricing to the value beyond the labor cost?

Jesse: I think it’ll probably be mostly anchored in labor costs because that is…that’s what is exciting about agents, right, is that you have all this spend in the past that was going towards services. That size of the spend is probably like 10 to 100x the software spend. So a lot of that’s going to move towards software. So when it does, the natural benchmark is, of course, the labor. And for our customers, ROI, again, is very clear. If you’re saving, like, X million in labor costs, then it makes sense to adopt a solution like this. But it’ll probably be somewhere in the middle, right, because, like, there will be other agents that come out, even if they’re not as good, that they set prices and, like, this kind of the classic SaaS sort of situation where you’re competing for business.

Kimberly: What do you think the future of current SaaS incumbents is in the world of AI? Either given that their products are maybe not architected to be AI native, or the way the price is seat-based, and therefore they just aren’t really adjusted to outcomes-first pricing model.

Jesse: Yeah, it’s a little bit tricky for incumbents if they’re trying to launch agents because they just can’t analyze their seat-based model, right? If you don’t need that many agents anymore, then it’s kind of tricky if the new thing you’re pushing just eats up your current revenue. So, that’s one thing with incumbents. But it’s also hard to say. Like the incumbents always have the power of, hey, we have distribution, right? Like the product doesn’t have to be as good. But, you know, people don’t want to go through the effort of adopting a new vendor if it’s like 80% as good. So number one, if you’re a company like us, you have to make sure that you’re like 3x as good as the incumbent offerings. And then two, the issue, the classic…it’s like the classic incumbent versus startup thing. Like incumbents have less risk tolerance naturally because they have a ton of customers. And if they’re iterating quickly and something doesn’t go well, that’s, like, a big loss for them. Whereas, you know, younger companies can always iterate a lot faster. And then the iteration process just inherently leads to better product. And so that’s, like, the cycle. And for us, we always want to pride ourselves on shipping speed, quality of the products, just how hardcore our team is in terms of delivering things. And so that’s how we’ve been winning our current deals.

Kimberly: I’d love for you to make any predictions on the future of AI in the workplace, either, like, how it’ll change staffing needs or capabilities or how human employees and AI agents will have to interact, or different types of, like, best practices or norms that you think will become commonplace in the workforce as AI agents become more prevalent.

Jesse: Yeah. Number one thing is we have pretty high conviction that the amount of time people spend going forward in the workplace on building and managing agents, kind of like the AI supervisor type role, is going to shoot through the roof. Even if your title is not officially, like, AI supervisor, it’s like whatever you were doing in the past, a lot of that time is now going to be on managing the agents because the agents give you so much leverage. So we’ve seen that with many of our deployments as well, is that the people on the team that were, like, leading the team, they’re spending a lot of their time kind of monitoring the AI, like checking to make sure that nothing needs to be improved or making changes and monitoring, like, how is it going? Like, what are the overall stats looking like? Okay. Like, is there a specific area we need to be focused on? Is there a gap in the knowledge base that could help the AI just be better? And can the AI fill that in for me? Right. There’s just, like, all this stuff that comes with working with agents, that the amount of people’s work hours that go towards working with agents is going to go straight up. And that’s our core thesis for the company, right? As I mentioned, so that’s why our whole product is built around giving people tooling and visibility and explainability and control over the AI agents. And in a year, I think this is going to be a huge thing.

Kimberly: Makes sense. What do you think are the capabilities that an AI supervisor needs going forward? What is that skill set?

Jesse: There’s like two sides of it, right? There’s the observability, explainability piece of like, can you just very quickly grok what the AI is doing, how it’s making decisions? Second side of it is the decision-making or, like, the sort of, not decision-making, like, the building part of how do you give a feedback? How do you build new logic? I think those are the two sides of the coin.

Kimberly: And is there any type of work that you think either in the medium to maybe long term, AI agents will not be able to handle and it’s actually incumbent upon humans to still be able to manage and do properly?

Jesse: I think it will mostly come down to the point I was making earlier around, like, how perfect something needs to be. I think there’s a lot of jobs where, like, the bar for error is, like, super low. And so what will usually happen in those cases is that any AI tooling ends up being more of a co-pilot rather than, like, a full agent. Maybe in, like, the more sensitive industries like healthcare or security or whatever, where you have to be, like, almost perfect. Yeah, then the agents, I think, are going to be less autonomous, which is not to say they won’t be useful. I think the style is going to be a bit different, whereas in a space like ours, you’re kind of deploying these agents to be autonomous and for them to complete the whole job.

More About This Podcast

Artificial intelligence is changing everything from art to enterprise IT, and a16z is watching all of it with a close eye. This podcast features discussions with leading AI engineers, founders, and experts, as well as our general partners, about where the technology and industry are heading.

Learn More