
Friday Oct 03, 2025
Solving the Cold Start Problem in AI Inference
In this episode of Inference Time Tactics, Rob, Cooper, and Byron sit down with Prashanth Velidandi, co-founder of InferX, to explore how serverless inference is tackling the AI “cold start problem.” They dig into why 90% of the model lifecycle happens at inference—not training—and how cold starts and idle GPUs are crippling efficiency. Prashanth explains InferX’s snapshot technology, what it takes to deliver sub-second cold starts, and why inference infrastructure—not just models—will define the next era of AI.
We talked about:
- Why inference represents 90% of the model lifecycle, compared to the training focus most of the industry has.
- How cold starts and idle GPUs create massive inefficiencies in AI infrastructure.
- InferX’s snapshot technology that enables sub-second model loading and higher GPU utilization.
- The challenges of explaining and selling deeply technical infrastructure to the market.
- Why enterprises care about inference efficiency, cost, and reliability more than model size.
- How serverless inference abstracts away infrastructure complexity for developers.
- The coming explosion of multi-agent systems and billions of specialized models.
- Why sustainable innovation in AI will come from inference infrastructure.
Connect with InferX
Prashanth Velidandi
https://www.linkedin.com/in/prashanth-velidandi-98629b115
Connect with Neurometric:
Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
Bluesky: https://bsky.app/profile/neurometric.bsky.social
Rob May
https://www.linkedin.com/in/robmay
Calvin Cooper
https://www.linkedin.com/in/coopernyc
Byron Galbraith
No comments yet. Be the first to say something!