FLUX.2 Image Generation in under 1 second. Read More →

FLUX.2 Dev

Sub-second image generation. 4x faster than PyTorch torch.compile.

Generate and edit high-quality images from text prompts — in under a second. Modular compiles the full FLUX.2 pipeline into a single optimized graph. Optimized on NVIDIA B200s and AMD MI355X.

  • <1s

    Generation time

  • 4.1x

    vs Torch Compile

  • $0.001

    per image

  • GPUs

    NVIDIA or AMD

Modular Performance⚡️ vs. PyTorch Diffusers

If you're already running the Modular Platform in production, adding image generation requires no changes. Just swap the endpoint. As a result, you'll experience around a 4x latency speedup on the FLUX.2 family of models (compared to PyTorch Diffusers with torch.compile). The results are image resolution dependent:


Image resolutionPyTorch Diffusers vs. Modular's Speedup
1024x10244.1x
1360x7683.4x
768x13604.0x

No visible image quality loss ✅

The figure below shows the results from both torch.compile (left) and Modular's MAX Framework (right). The image quality is virtually identical, while the performance of MAX is noticeable - a 4x that of torch.compile. The tolerance for image quality is configurable, and we are able to get good quality images in sub-second. This opens up a large amount of workflows that were otherwise blocked because the lack of almost real time image generation.


99% Cheaper than Nano Banana 🍌


Modular on MI355X delivers FLUX.2 images for a fraction of a cent each - 99% cheaper than Google's Nano Banana Pro ($0.134/image) and 82% cheaper than running torch.compile on B200:


ProviderCost / image (1024x1024)Savings with Modular
Nano Banana Pro (Gemini 3 Pro Image)$0.13499.0%
torch.compile on B200$0.0077882.1%
MAX on B200$0.0019428.4%
MAX on MI355X$0.00139Reference

Why Modular outperforms

  • Advanced Compiler

    Kernel fusion and dynamic batching optimized for code generation patterns

  • Efficient Runtime

    90% smaller containers enable faster scaling and lower infrastructure overhead

  • Intelligent Batching

    Adapts to real-world traffic spikes during business hours

  • Hardware arbitrage

    Execute workloads on the right hardware for the task at hand.

  • Granular metrics and dashboards

    Fine-grained visibility into performance, usage, and more, making issues easy to spot.

  • Forward deployed engineering support

    Engineers work directly with your team to deploy, tune, and operate systems.

Deploy Anywhere. Run Optimally.

We’ll handle the autoscaling of your traffic across hardware. Our AI infrastructure runs across NVIDIA and AMD without code changes, so future flexibility is also baked in.

Supported hardware:

  • Full production support for the following NVIDIA GPUs

    • B200

    • H200

    • H100

    • A100

  • Full production support for the following AMD GPUs

    • MI355X

    • MI300X

    • MI250X

    • MI210

    Achieve 30-60% lower costs with Modular on AMD hardware - Read More

Coming soon:

Custom accelerators - let us know what you want!

Hardware Independence = Business Resilience

Why Portability Matters to Your Business:

  • Choice & Flexibility

    Not locked to single GPU vendor. Drive 30-60% cost savings. Better supply availability. Flexibility of deployment.

  • Risk Mitigation

    No single point of failure. Multi-cloud without complexity. Platform vendor independence.

Deployment Flexibility:

  • Our Cloud or Yours

    Deploy on our cloud or in your own environment, with the same capabilities and performance.

    See Deployment Options

Why teams are switching to Modular

“~70% faster compared to vanilla vLLM”

"Our collaboration with Modular is a glimpse into the future of accessible AI infrastructure. Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation, at just 200ms for 2 second chunks. This allowed us to serve more QPS with lower latency and eventually offer the API at a ~60% lower price than would have been possible without using Modular’s stack."

Igor Poletaev

Chief Science Officer - Inworld

Latest customer case studies:

Modular partners with AWS to democratize AI Infrastructure

Modular partnered with AWS to bring MAX to AWS Marketplace, offering SOTA performance for GenAI workloads across GPUs types.

Unleashing AI performance on AMD GPUs with Modular's Platform

Modular partners with AMD to bring the AI ecosystem more choice with state-of-the-art performance on AMD Instinct GPUs.

Revolutionizing your own research to production

Modular allows Qwerky AI to do advanced AI research, to write optimized code and deploy across NVIDIA, AMD, and other types of silicon.

2x cost savings with the fastest text-to-speech model ever

We made state-of-the-art speech synthesis scalable, and achieved a truly remarkable improvement both for the latency and throughput.

Go Deeper

  • Frontier-scale MoE Serving at Modular:  Modular Tech Talk

    52:06

  • Mammoth Serving:  Modular Tech Talk

    28:49

Start building!

  • Get Sandbox Access

    Evaluate real performance and reliability in a live environment before committing to a deployment path.

    • Pre-configured DeepSeek V3 environment

    • 100M free inference tokens

    • 14-day full-featured trial

  • Talk to us!

    Get expert guidance on architecture, performance tradeoffs, and migration paths tailored to your system.

    • Architecture review

    • Performance validation

    • Migration planning

Person with blonde hair using a laptop with an Apple logo.

Custom demo of FLUX.2

We'll show you Modular's benchmarks on workloads similar to yours.

Thank you for your submission.

Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.

Thank you,

Modular Sales Team

Oops! Something went wrong while submitting the form.

Thank you for your submission.

Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.

Thank you,

Modular Sales Team

Developer Approved

Person with blonde hair using a laptop with an Apple logo.

works across the stack

scrumtuous

“Mojo can replace the C programs too. It works across the stack. It’s not glue code. It’s the whole ecosystem.”

impressed

justin_76273

“The more I benchmark, the more impressed I am with the MAX Engine.”

potential to take over

svpino

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

one language all the way through

fnands

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through is be awesome.”

easy to optimize

dorjeduck

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

pure iteration power

Jayesh

"This is about unlocking freedom for devs like me, no more vendor traps or rewrites, just pure iteration power. As someone working on challenging ML problems, this is a big thing."

high performance code

jeremyphoward

"Mojo is Python++. It will be, when complete, a strict superset of the Python language. But it also has additional functionality so we can write high performance code that takes advantage of modern accelerators."

huge increase in performance

Aydyn

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

The future is bright!

mytechnotalent

Mojo destroys Python in speed. 12x faster without even trying. The future is bright!

actually flies on the GPU

Sanika

"after wrestling with CUDA drivers for years, it felt surprisingly… smooth. No, really: for once I wasn’t battling obscure libstdc++ errors at midnight or re-compiling kernels to coax out speed. Instead, I got a peek at writing almost-Pythonic code that compiles down to something that actually flies on the GPU."

feeling of superpowers

Aydyn

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

surest bet for longterm

pagilgukey

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

impressive speed

Adalseno

"It worked like a charm, with impressive speed. Now my version is about twice as fast as Julia's (7 ms vs. 12 ms for a 10 million vector; 7 ms on the playground. I guess on my computer, it might be even faster). Amazing."

completely different ballgame

scrumtuous

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

was a breeze!

NL

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

amazing achievements

Eprahim

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

performance is insane

drdude81

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

very excited

strangemonad

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

Community is incredible

benny.n

“The Community is incredible and so supportive. It’s awesome to be part of.”

12x faster without even trying

svpino

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Sign up today

    Signup to our Cloud Platform today to get started easily.

    Sign Up
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    Browse our model catalog, or deploy your own custom model

    Browse models

Latest Blog Posts