Blog

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

🚨

NEW

Engineering

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

October 17, 2025

Tracy Sharpe

Anand Pratap Singh

Prince Jain

Abdul Dakkak

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

In this blog post, we’ll continue our journey to build a state-of-the-art (SOTA) matmul kernel on NVIDIA Blackwell by exploring the cluster launch control (CLC) optimization. At the end of the post we’ll improve our performance by another 15% and achieve 1772 TFLOPs, exceeding that of the current SOTA.

September 19, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

In this post, we continue on this journey and discuss how to leverage the 2SM technique along with pipelining to increase our performance about 5x and get within 85% of state-of-the-art (SOTA).

September 12, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

In this post we are going to continue our journey and improve our performance by more than 50x our initial kernel benchmark. Along the way we are going to explain more GPU programming concepts and leverage novel Blackwell features.

September 5, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Abdul Dakkak

Read

🚨

NEW

Engineering

Matrix Multiplication on Blackwell: Part 1 - Introduction

This series of blog posts will showcase how one can: 1. Write a high-performance GPU kernel on Blackwell that offers performance competitive to that of NVIDIA's cuBLAS implementation. 2. Shows how one can leverage Mojo's special features to make the kernel as simple as possible.

August 28, 2025

Ali Taha

Jiexiang Liu

Hengjie Wang

Read

🚨

NEW

Engineering

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

January 30, 2025

Ehsan M. Kermani

Read

🚨

NEW

Engineering

Use MAX with Open WebUI for RAG and Web Search

Learn how quickly MAX and Open WebUI get you up-and-running with RAG, web search, and Llama 3.1 on GPU

January 23, 2025

Bill Welense

Read

🚨

NEW

Engineering

Hands-on with Mojo 24.6

Mojo 24.6 introduces key improvements in argument conventions, memory management, and reference tracking, enhancing code clarity and safety with features like 'mut' for mutable arguments, 'origins' for references, and new collection types.

January 21, 2025

Ehsan M. Kermani

Read

🚨

NEW

Engineering

Evaluating Llama Guard with MAX 24.6 and Hugging Face

Imagine unlocking a world of open innovation while ensuring secure, reliable, and enterprise-ready Gen AI deployments—MAX 24.6 enables enterprise AI teams to seamlessly run a vast range of cutting-edge AI models from Hugging Face on NVIDIA GPUs.

December 19, 2024

Bill Welense

Read

🚨

NEW

Engineering

MAX GPU: State of the Art Throughput on a New GenAI platform

Measuring state of the art GPU performance compared to vLLM on Modular's MAX 24.6

December 17, 2024

Max Hutchinson

Tyler Kenney

Read

Sign up for our newsletter

Get all our latest news, announcements and updates delivered directly to your inbox. Unsubscribe at anytime.

Thank you for your submission.

Your report has been received and is being reviewed by the Sales team. A member from our team will reach out to you shortly.

Thank you,

Modular Sales Team

Start building with Modular

Get started - Docs

Blog

Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days

Matrix Multiplication on Blackwell: Part 4 - Breaking SOTA

Matrix Multiplication on Blackwell: Part 3 - The Optimizations Behind 85% of SOTA Performance

Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul

Matrix Multiplication on Blackwell: Part 1 - Introduction

Agentic Building Blocks: Creating AI Agents with MAX Serve and OpenAI Function Calling

Use MAX with Open WebUI for RAG and Web Search

Hands-on with Mojo 24.6

Evaluating Llama Guard with MAX 24.6 and Hugging Face

MAX GPU: State of the Art Throughput on a New GenAI platform

Sign up for our newsletter

Quick start resources