Inference Time Tactics

A podcast exploring the emerging field of inference-time compute—the next frontier in AI performance. Hosted by the Neurometric team, we unpack how models reason, make decisions, and perform at runtime. For developers, researchers, and operators building AI infrastructure.

Listen on:

  • Apple Podcasts
  • YouTube
  • Podbean App
  • Spotify

Episodes

Monday Aug 18, 2025

In this episode of Inference Time Tactics, Rob, Cooper, and CTO Byron unpack Apple’s “Illusion of Thinking” paper—why it split the AI community, what it reveals about reasoning model limits, and how hidden thinking traces shape performance. They share insights from building an open-source tool to reproduce the study, explain why models loop, overthink, or stall, and outline what it will take to build more reliable reasoning systems for real-world use.
We talked about:
 
Why Apple’s Illusion of Thinking paper sparked heated debate in the AI community.
How reasoning models work, including hidden “thinking” phases and token budget limits.
Key findings on when reasoning improves results, when it degrades them, and where it stalls.
Reasons models loop, overthink, or abandon tasks.
Building an open-source tool to replicate the study and test local reasoning models.
What real-time reasoning traces reveal about model behavior and limits.
Challenges in scoring reasoning quality and treating “I don’t know” as a valid output.
Why reasoning models must be matched carefully to specific tasks.
The ongoing debate over scaling vs. new architectures for advancing reasoning.
Developing a benchmarking platform to help enterprises choose models for IP-sensitive applications.
Resources Mentioned:
Illusion of Thinking Paper
https://machinelearning.apple.com/research/illusion-of-thinking 
Neurometric Illusion of Thinking Tool
https://github.com/NeurometricAI/illusion-of-thinking 
 
Connect with Neurometric:Website: https://www.neurometric.ai/ 
Substack: https://neurometric.substack.com/ 
X: https://x.com/neurometric/ 
Bluesky: https://bsky.app/profile/neurometric.bsky.social
 
Hosts:
Rob May
https://x.com/robmay 
https://www.linkedin.com/in/robmay
 
Calvin Cooper
https://x.com/cooper_nyc_ 
https://www.linkedin.com/in/coopernyc
 
Guest/s:
Byron Galbraith
https://x.com/bgalbraith 
https://www.linkedin.com/in/byrongalbraith

Tuesday Aug 12, 2025

In this episode of Inference Time Tactics, Rob and Cooper dig into the strategic trade-offs driving a major shift in AI: why some enterprises start with closed models like OpenAI or Anthropic, then move to open-source stacks. The team breaks down the challenges of switching and how inference-time compute is becoming a competitive differentiator. They also unpack why pricing is shifting, how governance will evolve for this new layer, and what Rob learned from reviewing 250 research papers on reasoning algorithms. 
We talked about: 
 
Insights from reviewing 250 research papers on reasoning algorithms. 
Why enterprises start with closed models like OpenAI or Anthropic before moving to open-source stacks. 
Challenges of switching stacks, including model fragmentation, capability gaps, and hardware choices. 
Cost-performance trade-offs when choosing inference architectures. 
How inference-time configuration can become a competitive differentiator. 
The role of pricing shifts and vendor lock-in in AI adoption. 
Emerging governance considerations for inference workflows. 
The growing variety and complexity of inference-time techniques.. 
Benchmarking challenges for multi-step and reasoning tasks. 
Why the lack of best practices makes inference optimization harder to operationalize. 
  
Connect with Neurometric:Website: https://www.neurometric.ai/  
Substack: https://neurometric.substack.com/  
X: https://x.com/neurometric/  
Bluesky: https://bsky.app/profile/neurometric.bsky.social  
 
Hosts: 
Rob May 
https://x.com/robmay  
https://www.linkedin.com/in/robmay 
 
Calvin Cooper 
https://x.com/cooper_nyc_  
https://www.linkedin.com/in/coopernyc Comment end  

Friday Aug 01, 2025

Welcome to the very first episode of Inference Time Tactics — the podcast for builders, researchers, and engineers pushing the limits of AI performance.
In this kickoff conversation, hosts Rob May and Cooper (co-founders of Neurometric AI) break down why inference time compute is emerging as the third scaling law of AI — and why it matters more than ever.
They unpack:
What “inference time compute” really means (and how it differs from training and fine-tuning)
Why reasoning algorithms like best-of-N, chain of thought, and beam search are reshaping performance
How recent research — and OpenAI’s 2024 reasoning model — sparked an explosion of interest
The challenge of reliability (“three nines” and beyond) in multi-step agent workflows
Why open-source models may win big, and where inference fits at the edge
This is a technical, tactical deep-dive — but without the heavy math. If you’re building the next generation of AI systems, or just want to understand where the field is really headed, this episode is your starting point.
🔗 Learn more at neurometric.ai

Copyright 2025 All rights reserved.

Podcast Powered By Podbean

Version: 20241125