Hippocratic AI + Modular to power real-time patient conversations. Read More →

Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations

Customers

One inference stack across NVIDIA and AMD GPUs

Production performance without compromising model quality or clinical accuracy

Hardware flexibility that scales with demand

01

One inference stack across NVIDIA and AMD GPUs

02

Production performance without compromising model quality or clinical accuracy

03

Hardware flexibility that scales with demand

Problem

Hippocratic AI builds safety-focused AI health agents that converse with patients, helping to close the global shortfall of 15 million healthcare workers. Their Polaris system orchestrates dozens of specialized models in parallel to ensure every interaction is clinically safe, with error rates lower than human clinicians. Hippocratic AI’s systems scale to contacting tens of thousands of patients daily and build trust that AI products can be used in highly regulated industries.

For real-time voice to work, latency can't exceed 800ms per conversational turn. Each patient session runs 40–80 turns, with the first turn processing ~10,000 tokens of context, all while safety models analyze the same conversation in parallel. As Hippocratic AI scaled to more concurrent patients per GPU node, existing inference frameworks couldn't hold tail latencies tight enough to keep conversations feeling natural.

As Hippocratic AI scales globally, they face a challenge familiar to every team running large models in production: GPU supply is fragmented across vendors and clouds, costs vary widely, and being locked to a single accelerator limits both supply resilience and unit economics.

Solution

Our partnership with Hippocratic AI is a joint effort where both teams worked together to integrate Modular's MAX framework across both NVIDIA and AMD GPUs in Hippocratic AI's inference pipelines, running Llama 3.1 405B for their Polaris system.

Modular has rebuilt the AI infrastructure stack from the ground up. From highly optimized, portable kernels written in Mojo, to model serving infrastructure with MAX, to cloud orchestration that can be deployed in Modular's cloud or yours. This vertically integrated approach, built over years of deep infrastructure investment, is what enables Modular to extract performance that frameworks built on top of existing components can't match.

MAX's delivered across every dimension that matters:

  • Keep every conversation instant. MAX delivers sub-second mean time to first token (TTFT. Patients get responsive, natural interactions with no perceptible delay.

  • Performance without quality trade-offs. Many inference frameworks reach for aggressive optimizations that quietly degrade model accuracy. MAX takes a different approach: its kernels and serving stack are designed to preserve the numerical behavior of the underlying model, so the extensive clinical safety evaluations Hippocratic AI runs on Polaris carry over directly into production.

A faster path to new hardware. As new accelerators come to market, MAX's portable kernel architecture means Hippocratic AI doesn't have to wait for vendor-specific framework support to evaluate them.

Results

By standardizing on MAX, Hippocratic AI unlocks a heterogeneous deployment strategy that was previously out of reach.

Modular's collaboration with Hippocratic AI is just getting started. Because MAX's portability comes from its optimized kernel library and scheduling architecture rather than vendor-specific glue, these same benefits generalize to the large reasoning models becoming the backbone of production AI deployments. MAX is ready to deliver flexible, hardware-agnostic deployment for the most advanced frontier LLMs used in production.

About Hippocratic

Hippocratic AI has developed the safest generative AI Agents for healthcare. The company believes that generative AI has the ability to bring healthcare abundance to every person in the world. The company focuses on building non-diagnostic patient-facing clinical AI agents and does not allow its agents to be used to prescribe or diagnose. Hippocratic AI has received a total of $404 million in funding and is backed by leading investors, including Andreessen Horowitz, General Catalyst, Kleiner Perkins, Avenir, NVIDIA’s NVentures, Premji Invest, SV Angel, Google’s CapitalG, and numerous health systems. Learn more at https://hippocraticai.com/.

Request a demo of this use case

If you're deploying large language models for inference, request a demo today. Excited to chat!

Case Studies

2x cost savings with the fastest text-to-speech model ever

We made state-of-the-art speech synthesis scalable, and achieved a truly remarkable improvement both for the latency and throughput.

Modular partners with AWS to democratize AI Infrastructure

Modular partnered with AWS to bring MAX to AWS Marketplace, offering SOTA performance for GenAI workloads across GPUs types.

Modular partners with NVIDIA to accelerate AI compute everywhere

Modular’s Platform provides state-of-the-art support for NVIDIA Blackwell, Hopper, Ampere, Ada Lovelace and NVIDIA Grace Superchips.

Unleashing AI performance on AMD GPUs with Modular's Platform

Modular partners with AMD to bring the AI ecosystem more choice with state-of-the-art performance on AMD Instinct GPUs.

Revolutionizing your own research to production

Modular allows Qwerky AI to do advanced AI research, to write optimized code and deploy across NVIDIA, AMD, and other types of silicon.

Unlocking fast AMD compute for all

AI inference has a cost problem. Hardware alone isn't enough - customers need software that can extract every ounce of performance from these chips. TensorWave and Modular team up to shatter the cost-performance ceiling for AI inference.

Scales for enterprises

  • Dedicated enterprise support

    We are a team of the world's best AI infrastructure leaders who are reinventing and rebuilding accelerated compute for everyone.

  • Infinitely scalable to reduce your TCO

    Optimize costs and performance with multi-node inference at massive scale across cloud or on-prem environments.

  • Enterprise grade SLA

    Our performance is backed with an enterprise grade SLA, ensuring reliability, accountability, and peace of mind.

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Sign up today

    Signup to our Cloud Platform today to get started easily.

    Sign Up
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    Browse our model catalog, or deploy your own custom model

    Browse models