Gemma 4 just dropped on Modular, Day Zero! Read More →

May 13, 2026

Translating to Mojo via AI Agents

Brad Larson

Modular Team

Product

At Modular, we’re always experimenting with the latest agentic programming tools, integrating the best ones into our workflows, and learning quite a few lessons along the way. One thing we realized is that the Mojo language is ideally suited to the needs of modern AI coding agents.

Mojo has a familiar syntax with minimal boilerplate, so it’s token-efficient for agents to read and write. Its type system and constraint model catch many common errors at compile time. Rather than having an agent chew tons of tokens to build something that may or may not work, then spend hours debugging it when it doesn’t, Mojo catches problems early and provides clear error messages to the agents. This tighter feedback loop is one reason typed languages are increasingly favored for agentic workflows.

Mojo also doesn't trade ergonomics for performance. The same code that reads cleanly can target the full range of hardware Mojo supports, including NVIDIA, AMD, and Apple silicon GPUs.

Helping AI agents work with Mojo

The only challenge is that Mojo is still a young language and LLMs haven’t been trained on lots of Mojo code. That’s why we’ve created tools to help agents produce correct, idiomatic Mojo from the start. This starts with open-sourcing of all of the Mojo code we’ve written at Modular, along with all of our Python APIs for MAX.

Taking this to the next level, Mojo now has coding agent skills that plug into most AI coding assistants. These skills act as a lightweight adapter that corrects misconceptions, out-of-date patterns, and non-idiomatic code that AI models might produce.

To get started with the skills, you can install them in any supported coding agent with a single command:

bash
npx skills add modular/skills

Demo: translating CUDA code to Mojo

Mojo’s killer application today is the ability to write high-performance and portable GPU kernels. However, most of the world’s kernel code today is written in CUDA and Triton (primarily for NVIDIA GPUs). Using our AI agent skills, you can quickly translate these existing kernels to Mojo.

Let’s see an example of this in action.

In this example, we’ll start from Szymon Ożóg’s exploration of Softmax optimizations in CUDA. Clone this repository locally as a reference to follow along at home:

bash
git clone git@github.com:SzymonOzog/FastSoftmax.git

In this tutorial, I’ll be using Claude Code with the Opus 4.7 model, but the same steps work with any frontier coding agent.

First, let’s create a new Mojo project. Our new-modular-project skill handles project setup and walks you through choosing the right configuration.

Start a new session in your AI agent and begin by giving it this prompt:

bash
I'd like to start a new Mojo project named mojo_softmax using a quick uv
environment and the nightly build of Mojo.

Your agent should activate the new-modular-project skill and handle the project setup for you:

bash
 Skill(new-modular-project)                                                    
  Successfully loaded skill                                                
                                                                                
 All options specified. Creating a quick uv environment with nightly Mojo.
                                                                                
  Listed 1 directory (ctrl+o to expand)                                         
                                                                               
 Bash(mkdir mojo_softmax && cd mojo_softmax && uv venv && uv pip install mojo  
      --index https://whl.modular.com/nightly/simple/ --prerelease allow)     
  Running…                                                                   

and then report back when the new project is ready:

bash
 Bash(source .venv/bin/activate && mojo --version)                             
  Mojo 0.26.3.0.dev2026042105 (cfe16c5b)                                     
                                                                                
 Project mojo_softmax is set up at ./mojo_softmax with a quick uv venv and nightly Mojo  
  0.26.3.0.dev2026042105 installed.                                             
                                                                                
  To work in it:                                                                
  cd mojo_softmax                                                               
  source .venv/bin/activate   

Once the project is configured with the Mojo compiler and tools, we can begin translating the CUDA reference code to Mojo. Use a prompt like the following, adjusting the path to match where you cloned FastSoftmax:

bash
In the directory ./FastSoftmax, there's a CUDA example of a fast softmax
I'd like to port to Mojo. When done, build it and test it to verify
that it works correctly.

Your output may vary by agent, but in a local run of this I saw:

bash
 I've read all the CUDA kernels. The file contains 10 progressive optimizations, with
 `softmax_kernel8` as the default - it uses online normalization (single-pass max+sum), float4
 vectorized loads, warp shuffles, and shared-memory cross-warp reduction.
 
 I'll port this to Mojo using shared-memory tree reductions (universal across GPU backends
 including Apple silicon) with coalesced strided access, matching the algorithm of kernels 3-8.                                                                                                                            

The agent will then activate the mojo-syntax and mojo-gpu-fundamentals skills to write a fast_softmax.mojo file:

bash
 Now I'll load the Mojo skills to guide the port.                              
                                                                                
⏺ Skill(mojo-gpu-fundamentals)                                                  
  ⎿  Successfully loaded skill                                                  
                                                                                
⏺ Skill(mojo-syntax)                                                            
  ⎿  Successfully loaded skill                            

⏺ Now I have enough understanding to write the Mojo port. I'll implement the    
  softmax kernel, plus a test harness that verifies against a CPU       
  reference.                                                                    
                                                                                
 Write(fast_softmax.mojo)                                                           

When complete, the agent will build and test the Mojo implementation to verify its correct operation:

bash

 Bash(source .venv/bin/activate && mojo build fast_softmax.mojo 2>&1 | head -80)
  (No output)                                          
                                  
 Build succeeded. Running the test now.                      
                                                          
 Bash(source .venv/bin/activate && ./fast_softmax 2>&1 | head -40)                  
  row 0 sum = 0.9999997 drift from 1.0 = 2.9802322e-07 
     row 1 sum = 0.99999994 drift from 1.0 = 5.9604645e-08                      
     row 2 sum = 1.0000001 drift from 1.0 = 1.1920929e-07                       
 +8 lines (ctrl+o to expand)                        
                                                                                
 The port is complete and verified working on the local GPU.                

Although your exact project format might differ from mine, your mojo_softmax project should now include a correct Mojo kernel and tests that exercise it (fast_softmax.mojo in my case here).

One key thing to note is that the original CUDA code for this example only worked on NVIDIA GPUs, but this new Mojo version runs on NVIDIA, AMD, and Apple silicon GPUs, as well as other future accelerators supported by Mojo. This is an incredibly fast way to bootstrap kernels, algorithms, and even models for new hardware.

This newly translated Mojo version should be functionally identical to the source, so you can read through the code to understand how concepts map from CUDA to Mojo. As a next step, you can also attempt to perform further Mojo-specific optimizations to this code, or tune it for the exact hardware you are working with. This also isn’t limited to using CUDA as a reference, the same process will work with Triton kernels in Python or code in various other languages.

Accelerate your Mojo journey

As you’ve seen, you can rapidly translate existing reference code you may have in Python, CUDA, or many other sources to Mojo. Out of the box, this can even lead to concrete improvements. For example, when Automatika Robotics translated some CUDA and SYCL kernels used for autonomous navigation to Mojo, they saw immediate performance gains. In their own words:

“Same workload we use in EMOS kompass-core: 5,001 trajectories × 1,000 points, 10 s horizon, 4 cost functions enabled.

  • kompass-core SYCL (AdaptiveCpp / CUDA): 16.358 ms (±0.12)
  • kompass-mojo (Mojo 0.26.1 / CUDA): 15.973 ms (±0.09)

I should note that I have used Claude to translate my mojo kernels, using the official skills and no optimization work on the mojo side has been done yet. Hence the initial result is quite impressive.”

Mojo 1.0 beta 1 has just been released, and using a frontier AI coding agent with these skills is a great way to get your older Mojo projects up-to-date for the official 1.0 release later this year. We know that LLMs benefit from languages that don’t change much over time, which is one reason we’re stabilizing Mojo for 1.0.

In fact, we took a random sample of five community projects, installed these skills, and prompted Claude Opus 4.7:

I’d like to update this project to the latest version of Mojo.

In all five cases, the agent correctly updated the entire project to build on the latest Mojo 1.0 beta 1 release with no other assistance.

Try it yourself

These Mojo coding skills are available now. Here are three ways to put them to use:

Speed up your Python. If you have a Python function that's become a bottleneck, an agent with the Mojo skills can translate it to Mojo. Point it at the slow code, and it will produce an initial Mojo port you can drop in, profile, and tune. If you want to go further, the same code can target a GPU with minimal changes.

Replace CUDA or Triton with Mojo. As the softmax demo shows, the skills handle the structural translation from CUDA to Mojo. The same process works for Triton kernels, as well as other kernel domain-specific languages. You get a portable starting point that runs on NVIDIA, AMD, and Apple silicon GPUs, without rewriting from scratch.

Get involved. The skills themselves are open source. If you hit a pattern the current skills don't handle well, open an issue or contribute a fix. The more real-world Mojo code agents encounter in the wild, the better they get at writing it.

Install the skills with npx skills add modular/skills and let us know what you build in the Modular forum.

Read more from Modular

View all blogs

Build the future of AI with Modular

View Editions
  • Person with blonde hair using a laptop with an Apple logo.

    Sign up today

    Signup to our Cloud Platform today to get started easily.

    Sign Up
  • Magnifying glass emoji with black handle and round clear lens.

    Browse open models

    Browse our model catalog, or deploy your own custom model

    Browse models
No items found.