Inference Time Tactics

A podcast exploring the emerging field of inference-time compute—the next frontier in AI performance. Hosted by the Neurometric team, we unpack how models reason, make decisions, and perform at runtime. For developers, researchers, and operators building AI infrastructure.

Listen on:

Episodes

Monday Aug 18, 2025

When AI Overthinks: Lessons from the Illusion of Thinking Paper

Monday Aug 18, 2025

In this episode of Inference Time Tactics, Rob, Cooper, and CTO Byron unpack Apple’s “Illusion of Thinking” paper—why it split the AI community, what it reveals about reasoning model limits, and how hidden thinking traces shape performance. They share insights from building an open-source tool to reproduce the study, explain why models loop, overthink, or stall, and outline what it will take to build more reliable reasoning systems for real-world use.
We talked about:

Why Apple’s Illusion of Thinking paper sparked heated debate in the AI community.
How reasoning models work, including hidden “thinking” phases and token budget limits.
Key findings on when reasoning improves results, when it degrades them, and where it stalls.
Reasons models loop, overthink, or abandon tasks.
Building an open-source tool to replicate the study and test local reasoning models.
What real-time reasoning traces reveal about model behavior and limits.
Challenges in scoring reasoning quality and treating “I don’t know” as a valid output.
Why reasoning models must be matched carefully to specific tasks.
The ongoing debate over scaling vs. new architectures for advancing reasoning.
Developing a benchmarking platform to help enterprises choose models for IP-sensitive applications.
Resources Mentioned:
Illusion of Thinking Paper
https://machinelearning.apple.com/research/illusion-of-thinking
Neurometric Illusion of Thinking Tool
https://github.com/NeurometricAI/illusion-of-thinking

Connect with Neurometric:Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
X: https://x.com/neurometric/
Bluesky: https://bsky.app/profile/neurometric.bsky.social

Hosts:
Rob May
https://x.com/robmay
https://www.linkedin.com/in/robmay

Calvin Cooper
https://x.com/cooper_nyc_
https://www.linkedin.com/in/coopernyc

Guest/s:
Byron Galbraith
https://x.com/bgalbraith
https://www.linkedin.com/in/byrongalbraith

Tuesday Aug 12, 2025

The Strategic Trade Offs Behind Inference Time Compute Decisions

Tuesday Aug 12, 2025

In this episode of Inference Time Tactics, Rob and Cooper dig into the strategic trade-offs driving a major shift in AI: why some enterprises start with closed models like OpenAI or Anthropic, then move to open-source stacks. The team breaks down the challenges of switching and how inference-time compute is becoming a competitive differentiator. They also unpack why pricing is shifting, how governance will evolve for this new layer, and what Rob learned from reviewing 250 research papers on reasoning algorithms.
We talked about:

Insights from reviewing 250 research papers on reasoning algorithms.
Why enterprises start with closed models like OpenAI or Anthropic before moving to open-source stacks.
Challenges of switching stacks, including model fragmentation, capability gaps, and hardware choices.
Cost-performance trade-offs when choosing inference architectures.
How inference-time configuration can become a competitive differentiator.
The role of pricing shifts and vendor lock-in in AI adoption.
Emerging governance considerations for inference workflows.
The growing variety and complexity of inference-time techniques..
Benchmarking challenges for multi-step and reasoning tasks.
Why the lack of best practices makes inference optimization harder to operationalize.

Connect with Neurometric:Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
X: https://x.com/neurometric/
Bluesky: https://bsky.app/profile/neurometric.bsky.social

Hosts:
Rob May
https://x.com/robmay
https://www.linkedin.com/in/robmay

Calvin Cooper
https://x.com/cooper_nyc_
https://www.linkedin.com/in/coopernyc Comment end

Friday Aug 01, 2025

Why Inference Time Compute Is the Future of AI

Friday Aug 01, 2025

Welcome to the very first episode of Inference Time Tactics — the podcast for builders, researchers, and engineers pushing the limits of AI performance.
In this kickoff conversation, hosts Rob May and Cooper (co-founders of Neurometric AI) break down why inference time compute is emerging as the third scaling law of AI — and why it matters more than ever.
They unpack:
What “inference time compute” really means (and how it differs from training and fine-tuning)
Why reasoning algorithms like best-of-N, chain of thought, and beam search are reshaping performance
How recent research — and OpenAI’s 2024 reasoning model — sparked an explosion of interest
The challenge of reliability (“three nines” and beyond) in multi-step agent workflows
Why open-source models may win big, and where inference fits at the edge
This is a technical, tactical deep-dive — but without the heavy math. If you’re building the next generation of AI systems, or just want to understand where the field is really headed, this episode is your starting point.
🔗 Learn more at neurometric.ai