Filtering by Tag: #AI

TrueFoundry’s Nikunj Bajaj on How to Get $100M Returns on AI Agent Deployments

Added on by Jon Krohn.

Imagine being able to deploy an AI agent and getting a return of over $100m from that single deployment. My guest today, Nikunj Bajaj, has facilitated that multiple times! Lots to learn from him, enjoy!

Nikunj:
• CEO and co-founder of TrueFoundry, a Bay Area-based startup that has raised over $20m to solve the thorniest problems that enterprises face when deploying agents.
• His clients include demanding organizations like NVIDIA and Siemens.
• Was previously ML tech lead at Facebook.
• Holds a master's in computer science from University of California, Berkeley.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Tokenmaxxing vs AI Hardware Bottlenecks

Added on by Jon Krohn.

Humans (like Reinforcement Learning algos) can "reward hack": "Tokenmaxxing" being a perfect example, after employers started using "number of tokens" consumed as a proxy for developers' productivity.

Even if humans weren't engaging in this pointless time-, money- and energy-consuming behavior, however, demand for A.I. compute is so vast that everyone's scrambling to to make more available. Alas, four tricky hardware bottlenecks face us:

1. GPUs:
• NVIDIA data-center GPU lead times now run 36–52 weeks, with Blackwell chips sold out through mid-2026.
• The real choke point isn't fabrication: It's TSMC's "CoWoS" advanced packaging, which is sold out through 2026. Nvidia alone has locked up ~60% of CoWoS capacity through 2027.

2. High-Bandwidth Memory (HBM):
• Demand has quintupled since 2023, and only three companies (SK hynix, Samsung and Micron) make it.
• All three are sold out well into 2026 and new HBM factories take 18–24 months to come online.

3. CPUs:
• As workloads shift toward agentic AI, the CPU:GPU ratio jumps from ~1:12 (for GenAI-only chatbots) to 1:1.
• Intel's CFO says the server-CPU shortfall "starts with a B" — billions in unmet demand so server CPU prices are up 10–20% in just the past couple of months.

4. Electricity: Hyperscaler build-outs are now gated by grid interconnect (18–36 months) and transformer lead times.

THE BIG MISMATCH
• The top 5 hyperscalers alone (Alphabet, Amazon, Meta, Microsoft and Oracle) are on track for ~$725B in combined 2026 capex.
• That's roughly 6x the hyperscalers' 2022 spend, with ~75% going to A.I. infrastructure.
• Hardware suppliers, however, have grown capex by only ~50%.... a 6x increase in demand met by only a 50% increase in supply is a big mismatch!

REASONS FOR OPTIMISM
Demand will continue to be high but I'm optimistic we'll continue to squeeze more juice from every lemon because, e.g.:
• Algorithmic efficiency keeps improving — Google's TurboQuant recently briefly tanked memory stocks by promising to materially cut inference memory needs.
• LLM efficiency gains via mixture-of-experts and smarter inference scheduling continue to compound.
• The tokenmaxxing trend is a corporate farce that will fade.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.