Modular acquires BentoML to deliver production AI in the cloud!  - Read more

February 10, 2026

BentoML Joins Modular

Chris Lattner

Chaoyu Yang

Tim Davis

Company

Today, BentoML is joining Modular.

Our goal is simple: make it dramatically easier to serve high-performance inference in production - fast, portable, and without hardware lock-in.

This acquisition accelerates our end-to-end platform vision. By unifying hardware-aware optimization with production deployment, teams can take models from development to reliable, scalable serving with fewer moving parts.

After months of collaborating on customer deployments, we’re thrilled to officially become one team.

Why BentoML and Modular Make Sense Together

When we started Modular, our vision was clear: unify the AI software stack from the bottom up. We built Mojo to give developers performance without complexity. We built MAX to optimize AI models across hardware. Now, with BentoML, we’re completing that stack with proven managed service infrastructure.

BentoML has built exactly what production AI teams need. More than 10,000 organizations, including 50+ Fortune 500 companies, use BentoML to deploy models at scale. Simply put, BentoML is where production AI teams live:

  • Battle-tested open source (Apache 2.0), trusted in real-world infrastructure
  • Used by 10,000+ organizations (including 50+ Fortune 500) to ship models
  • Deep experience turning “it runs” into “it runs reliably”


Modular and BentoML share the same DNA: open source foundations and a commitment to making AI infrastructure accessible. We’re not just aligned on technology; we’re aligned on values. Together, we’re building a single workflow that spans optimization, serving, and operation, so teams can ship faster with fewer moving parts.

As part of that mission, we are super charging our cloud team, if incredible at-scale AI inference gets you excited, come join us!

What This Enables

Together, we’re bringing performance optimization across the entire AI workflow into one cohesive stack.

  • Portability without rewrites: Deploy across NVIDIA, AMD, and future accelerators without rebuilding your entire serving stack. BentoML + MAX means fewer one-off “special case” deployments.
  • Better performance end-to-end: Speed isn’t just a kernel trick. When you control optimization + runtime + serving, you can improve throughput/latency while keeping the deployment ergonomics teams need.
  • Enterprise-ready BYOC: Bring your own cloud, VPC, or on-prem. Keep your security posture and infra choices, while still getting the benefits of an optimization layer built for modern accelerators.

If you use BentoML today: nothing breaks. The open source project continues under Apache 2.0 with the same docs, community, and contribution process, and commitments to BentoML customers continue without disruption.

What's Next

BentoML remains an active open source project. The community, documentation, and contribution process are unchanged, and development continues as usual. Our priority is stability for existing users as we begin integrating our platforms.

In the near term, we’ll focus on tight, practical integrations that make it easier to go from optimized models to production deployments, without disrupting existing BentoML workflows.

We’ll start sharing concrete progress soon. For now:

For the BentoML community: we’re committed to supporting the production workflows you rely on and the open source project you’ve built together. Join us on Tuesday February 17th from 9:30am-11:30am PT for an Ask Me Anything with Modular CEO Chris Lattner and BentoML CEO Chaoyu Yang in the Modular Forum, where we’ll answer questions and outline our integration roadmap.

For Modular developers: this acquisition reinforces our commitment to open source and production-ready infrastructure. BentoML brings proven deployment expertise, helping turn performance gains into real-world production wins.

This is just the beginning. We’ll share concrete integrations, benchmarks, and results as we continue to ship.


About BentoML

BentoML is the open source platform for deploying AI models at scale. Used by more than 10,000 organizations worldwide, BentoML makes it simple to package models, manage dependencies, and deploy to production with confidence. Learn more at bentoml.com.

About Modular

Modular is building unified AI infrastructure. Our Mojo programming language and MAX inference engine enable developers to build high-performance AI on any hardware. Learn more at modular.com.


Discover what Modular can do for you

Request a demo

Read more from Modular

View all blogs

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Get started guide

    Install MAX with a few commands and deploy a GenAI model locally.

    Read Guide
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    500+ models, many optimized for lightning-fast performance

    Browse models
No items found.