
Monday Sep 22, 2025
Drag, Drop, and Deploy: Rethinking How We Build AI Systems
In this episode of Inference Time Tactics, Rob, Cooper, Byron, and Dave share product updates for Neurometric’s Inference Time Compute Studio and what they reveal about the shift from single models to full AI systems. They discuss why wiring models together at scale is so challenging, how a drag-and-drop interface can make experimenting with inference strategies easier, and why open source, benchmarking, and community feedback are key to building the next generation of composable AI systems.
We talked about:
- Why AI is shifting from single models to full systems and what that means for builders.
- The challenges of wiring multiple models together at scale and running them in production.
- How Neurometric’s drag-and-drop interface simplifies testing inference strategies without code.
- Why open-source models are becoming increasingly competitive with commercial solutions.
- The lack of standardization in AI stacks and why the industry still feels like the “early web” era.
- How inference-time compute can balance performance, cost, and latency across different tasks.
- Why benchmarks alone are insufficient and how domain-specific evaluations can fill the gap.
- The role of community feedback in shaping priorities for benchmarks and new primitives.
Connect with Neurometric:
Website: https://www.neurometric.ai/
Substack: https://neurometric.substack.com/
Bluesky: https://bsky.app/profile/neurometric.bsky.social
Hosts:
Rob May
https://www.linkedin.com/in/robmay
Calvin Cooper
https://www.linkedin.com/in/coopernyc
Guest/s:
Byron Galbraith
https://www.linkedin.com/in/byrongalbraith
Dave Rauchwerk
No comments yet. Be the first to say something!