The agent hype is here, and I’ve bought in. While Matthew McConaughey Agentforce ads garner well-deserved chuckles, you don’t need to search far to find genuinely transformational agentic products. Replit Agent, v0, and Operator are increasingly splitting the world into people who have seen the future of software and people who haven’t.
As an infrastructure investor, what excites me most is that agents are a fundamentally new software workload. For the first time in more than a decade, tens of millions of developers have an urgent need for new infrastructure solutions to enable their progress.
Over the last two years the primary inhibitor to agentic progress and adoption has been the quality of the LLMs. I no longer believe that’s true. The LLMs are pretty damn great, and the inhibitor is becoming the infrastructure required for developers to build and scale agents, and for enterprises to adopt them securely and effectively.
By the end of 2025, I expect the toolset available to agent developers to be far more robust. I’m especially excited about a few new ideas.
The combination of a powerful pre-trained LLM and a reasoning engine on top of it appears to be a winner for complex tasks. That architecture, however, is not easily scaled down and post-trained for specific domains. I’ve heard and started to believe the argument that coding is the first complex task LLMs got great at because it’s both verifiable and proximate to the labs themselves. OpenAI, Anthropic*, and Google were able to “crack” coding because they know the domain well enough to be able to find advanced, high-quality training data for complex coding tasks. The open source domain also provides a tremendous amount of content (in this case, code) on which to train, further enhancing model accuracy.
If developers proximate to other complex domains can generate similar “golden datasets” of high-quality samples, how would they go about fine-tuning a reasoning model to improve performance? Today, even with high-quality training data and verifiable outputs, that’s nearly impossible. High quality reasoning engines are either closed source or impractically hard to fine-tune, in the case of Deepseek. Companies like Mercor are improving access to expert training data. The next step is lowering the barrier to entry for companies with data access to run stable, scaled reinforcement learning.
OpenAI is already moving in this direction with their Reinforcement Fine-Tuning research program. A Llama version of o1 (let alone o3) would accelerate progress even more. Combined with a platform to curate data and define step-by-step reward mechanisms for models, this could dramatically accelerate agents’ capabilities and compound the advantage of companies with access to proprietary data, helping vertical agents excel at high-value, verifiable tasks like automating data engineering work, penetration testing, or text-to-SQL. Over time it should help agents conquer more subjective work like planning product sprints or designing websites.
Much like the move to the cloud brought about cloud-native infrastructure like containers, Terraform, and Kubernetes, agents will require new serverless infrastructure. Agentic infrastructure will need to have a few key properties:
We’re already seeing serverless databases like Neon find incredible product-market fit with companies building fleets of agents. Today, agents create 4x more databases on Neon than humans do. As developers build larger agentic systems with higher compute peaks, they’ll start to feel a more acute need for serverless.
Many agents today are running client-side, leveraging a user’s own browser to execute actions. That’s a convenient way to start—it allows the agent to use a user’s existing apps and compute—but may not be enough. We increasingly see agentic architectures that require either more compute or increased security, stretching the limits of what can be done client-side.
Even for consumer agent use cases, more compute can mean better outcomes. A travel agent that’s tasked with building the best weeklong itinerary, when running on a users’ browser, might take a few minutes to generate a satisfactory output. That same agent, when given access to unlimited browsers through a platform like Browserbase, could pull together 20 different itineraries, compare them against the user’s known preferences, and generate a fantastic output, all in less time.
The opportunity in the enterprise is even greater. An agent tasked with optimizing node location in a Kubernetes cluster might want to ingest the user’s AWS logs, write code to test 10+ different location configurations, evaluate the results, and then apply the best run. Doing all of this in a user’s production AWS environment can be very risky, but running it in cloud sandboxes, like those offered by e2b and Modal, allows the agent to take greater risks in order to find the optimal path.
The internet’s client-server model has continually evolved, driven by changes in both the way the web is used and the hardware available to webapps. In the same way I expect the agentic client-server model to evolve, with headless browser clouds and sandboxed environments emerging as critical ways developers extend the compute and leniency available to them on users’ machines.
Even the most capable agents face two critical obstacles within enterprises: secure integration and trust. When enterprises hear the word “non-deterministic,” they immediately imagine the worst; If I give this agent access to my software, what will it screw up?
If agents need human-like permissions to be effective but can’t be trusted like humans, we’ll need authentication and authorization infrastructure to enable the kind of transformation boards (and investors) are expecting. Part of the solution is for apps themselves to provide more fine-grained authorization for agents–something more expansive than an API or simple function-calling, but less permissive than what a human would get. Companies like Descope* are beginning to offer this, even allowing developers to build human-in-the-loop authorizations for agents.
But the enterprise integration platform of the future would dictate not only what apps an agent can access and to what extent, but also where the agent can actually run actions. In some cases an agent may be able to authenticate into a service, pull some but not all available data based on its role and history, and run an action but only in a dynamically-provisioned sandboxed environment, limiting the downside of that action until a human can approve it. In orchestrating this, an “Okta for agents” could simultaneously enable agent usage and limit downside.
Today’s agents can execute specialized tasks with relative ease and more complex tasks effectively but inconsistently. They’re impressive enough to have already impacted the way developers and salespeople (among others) approach their jobs. But to cross the chasm to general usefulness, I believe agents will need to improve across a few vectors:
Better infrastructure can help across all of these. Solving each of them is likely a multi-billion-dollar opportunity.
Our team at Notable Capital has been backing exceptional founders building the future of software infrastructure for over two decades. If you’re building something new in agentic infrastructure—no matter how early—we’d love to be in touch.
Thank you to Bob Mcgrew, Jeremy Berman, Nikita Shamgunov, and my colleagues Glenn Solomon and Laura Hamilton for their guidance and feedback on this post.
*Represents a Notable Capital portfolio company.