One language, any hardware.
Pythonic syntax.
Systems-level performance.

Mojo unifies high-level AI development with low-level systems programming. Write once, deploy everywhere - from CPUs to GPUs - without vendor lock-in.

Start Building Today

Power up with Mojo

One language, any hardware
Bare metal performance
Easy to read, Pythonic code

fn add[size: Int](out: LayoutTensor, a:
LayoutTensor, b: LayoutTensor):
    i = global_idx.x
    if i < size:
        out[i] = a[i] + b[i]

Efficient element-wise addition of two tensors

def mojo_square_array(array_obj: PythonObject):
    alias simd_width = simdwidthof[DType.int64]()
    ptr = array_obj.ctypes.data.unsafe_get_as_pointer[DType.int64]()
    @parameter
    fn pow[width: Int](i: Int):
        elem = ptr.load[width=width](i)
        ptr.store[width=width](i, elem * elem)

Mojo function callable directly from Python

struct VectorAddition:
    @staticmethod
    def execute[target: StaticString](
        out: OutputTensor[rank=1],
        lhs: InputTensor[dtype = out.dtype, rank = out.rank],
        rhs: InputTensor[dtype = out.dtype, rank = out.rank]
        )
        @parameter
        if target == "cpu":
            vector_addition_cpu(out, lhs, rhs)
        elif target == "gpu":
            vector_addition_gpu(out, lhs, rhs)
        else:
            raise Error("No known target:", target)

A device-targeted vector addition kernel

Why we built Mojo

Vendor lock-in is expensive

You're forced to choose: NVIDIA's CUDA, AMD's ROCm, or Intel's oneAPI. Rewrite everything when you switch vendors. Your code becomes a hostage to hardware politics.

The two-language tax

Prototype in Python. Rewrite in C++ for production. Debug across language boundaries. Your team splits into 'researchers' and 'engineers' - neither can work on the full stack.

Python hits a wall

Python is 1000x too slow for production AI. The GIL blocks true parallelism. Can't access GPUs directly. Every optimization means dropping into C extensions. Simplicity becomes a liability at scale.

Toolchain chaos

PyTorch for training. TensorRT for inference. vLLM for serving. Each tool has its own bugs, limitations, and learning curve. Integration nightmares multiply with every component.

Memory bugs in production

C++ gives you footguns by default. Race conditions in parallel code. Memory leaks that OOM your servers. Segfaults in production at 3 AM.

Developer experience ignored

30-minute build times. Cryptic template errors. Debuggers that can't inspect GPU state. Profilers that lie about performance. Modern developers deserve tools that accelerate, not frustrate.

Why should I use Mojo ?

Easier

GPU Programming Made Easy

Traditionally, writing custom GPU code means diving into CUDA, managing memory, and compiling separate device code. Mojo simplifies the whole experience while unlocking top-tier performance on NVIDIA and AMD GPUs.

Get Started With GPUs

@parameter
for n_mma in range(num_n_mmas):
    alias mma_id = n_mma * num_m_mmas + m_mma
    
    var mask_frag_row = mask_warp_row + m_mma * MMA_M
    var mask_frag_col = mask_warp_col + n_mma * MMA_N
    
    @parameter
    if is_nvidia_gpu():
        mask_frag_row += lane // (MMA_N // p_frag_simdwidth)
        mask_frag_col += lane * p_frag_simdwidth % MMA_N
    elif is_amd_gpu():
        mask_frag_row += (lane // MMA_N) * p_frag_simdwidth
        mask_frag_col += lane % MMA_N

GPU-specific coordinates for MMA tile processing

PERFORMANT

Bare metal performance on any GPU

Get raw GPU performance without complex toolchains. Mojo makes it easy to write high-performance kernels with intuitive syntax, zero boilerplate, and native support for NVIDIA, AMD, and more.

GPU Fundamentals

@parameter
for i in range(K):
    var reduced = top_k_sram[tid]
    alias limit = log2_floor(WARP_SIZE)
    
    @parameter
    for j in reversed(range(limit)):
        alias offset = 1 << j
        var shuffled = TopKElement(
            warp.shuffle_down(reduced.idx, offset),
            warp.shuffle_down(reduced.val, offset),
        )
        reduced = max(reduced, shuffled)
    
    barrier()

Using low level warp GPU instructions ergonomically

InteroperabLE

Use Mojo to extend python

Mojo interoperates natively with Python so you can speed up bottlenecks without rewriting everything. Start with one function, scale as needed—Mojo fits into your codebase

Intro to Python Interop

if __name__ == "__main__":
    # Calling into a Mojo `passthrough` function from Python:
    result = hello_mojo.passthrough("Hello")
    print(result)

fn passthrough(value: PythonObject) raises -> PythonObject:
    """A very basic function illustrating passing values to and from Mojo."""
    return value + " world from Mojo"

Call a Mojo function from Python

Community

Build with us in the open to create the future of AI

Mojo has more than 750K+ lines of open-source code with an active community of 50K+ members. We're actively working to open even more to build a transparent, developer-first foundation for the future of AI infrastructure.

View Open Kernel Repo

750k

lines of open-source code

MOJO + MAX

Write GPU Kernels with MAX

@compiler.register("mo.sub")
struct Sub:
    @staticmethod
    fn execute[
        target: StaticString,
        _trace_name: StaticString,
    ](
        z: FusedOutputTensor,
        x: FusedInputTensor,
        y: FusedInputTensor,
        ctx: DeviceContextPtr,
    ) capturing raises:
        @parameter
        @always_inline
        fn func[width: Int](idx: IndexList[z.rank]) -> SIMD[z.dtype, width]:
            var lhs = rebind[SIMD[z.dtype, width]](x._fused_load[width](idx))
            var rhs = rebind[SIMD[z.dtype, width]](y._fused_load[width](idx))
            return lhs - rhs
        
        foreach[
            func,
            target=target,
            _trace_name=_trace_name,
        ](z, ctx)

Define a custom GPU subtraction kernel

Production ready

Powering Breakthroughs in Production AI

Top AI teams use Mojo to turn ideas into optimized, low-level GPU code. From Inworld’s custom logic to Qwerky’s memory-efficient Mamba, Mojo delivers where performance meets creativity.

Inworld Case Study

Qwerky Case Study

Inworld

Inworld used Mojo to define high-efficiency custom kernels to create things like a tailored silence-detection kernel that runs directly on the GPU.

Qwerky

Mojo enables Qwerky to compile custom GPU kernels accelerating Mamba's linear-time complexity for conversation history

Modern tooling

World-Class Tools, Out of the Box

Mojo ships with a great VSCode debugger and works with dev tools like Cursor and Claude. Mojo makes modern dev workflows feel seamless.

Get VSCode Extension

Mojo extension in VSCode

Mojo learns from

What Mojo keeps from C++

Zero cost abstractions
Metaprogramming power
Turing complete: can build a compiler in templates
Low level hardware control
Inline asm, intrinsics, zero dependencies
Unified host/device language

What Mojo improves about C++

Slow compile times
Template error messages
Limited metaprogramming
...and that templates != normal code
MLIR-native

What Mojo keeps from Python

Minimal boilerplate
Easy-to-read syntax
Interoperability with the massive Python ecosystem

What Mojo improves about Python

Performance
Memory usage
Device portability

What Mojo keeps from Rust

Memory safety through borrow checker
Systems language performance

What Mojo improves about Rust

More flexible ownership semantics
Easier to learn
More readable syntax

What Mojo keeps from Zig

Compile-time metaprogramming
Systems language performance

What Mojo improves about Zig

Memory safety
More readable syntax

“Mojo has Python feel, systems speed. Clean syntax, blazing performance.”

Explore the world of high-performance computing through an illustrated comic. A fresh, fun take—whether you're new or experienced.

Read the comic

Get started with Mojo

Mojo Manual

Learn how to write a simple program that performs vector addition on a GPU, exploring fundamental concepts of GPU programming.

View Tutorial

GPU Puzzles

A hands-on guide to mastering GPU programming using Mojo’s powerful abstractions and performance capabilities.

Python Interoperability

Because Mojo uses a Pythonic syntax, its easy to start reading and writing Mojo when coming from Python

Next-Gen GPU Programming

1:15:56

Kernel Programming & Mojo

52:51

GPU Programming Workshop

11:36

Developers love Mojo

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

“Tired of the two language problem. I have one foot in the ML world and one foot in the geospatial world, and both struggle with the 'two-language' problem. Having Mojo - as one language all the way through is be awesome.”

fnands

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

fnands

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

fnands

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

“I'm excited, you're excited, everyone is excited to see what's new in Mojo and MAX and the amazing achievements of the team at Modular.”

Eprahim

“Max installation on Mac M2 and running llama3 in (q6_k and q4_k) was a breeze! Thank you Modular team!”

“Mojo destroys Python in speed. 12x faster without even trying. The future is bright!”

svpino

fnands

"C is known for being as fast as assembly, but when we implemented the same logic on Mojo and used some of the out-of-the-box features, it showed a huge increase in performance... It was amazing."

Aydyn

“It’s fast which is awesome. And it’s easy. It’s not CUDA programming...easy to optimize.”

dorjeduck

Mojo destroys Python in speed. 12x faster without even trying. The future is bright!

mytechnotalent

“The Community is incredible and so supportive. It’s awesome to be part of.”

benny.n

“A few weeks ago, I started learning Mojo 🔥 and MAX. Mojo has the potential to take over AI development. It's Python++. Simple to learn, and extremely fast.”

svpino

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

"Mojo gives me the feeling of superpowers. I did not expect it to outperform a well-known solution like llama.cpp."

Aydyn

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

“I am focusing my time to help advance @Modular. I may be starting from scratch but I feel it’s what I need to do to contribute to #AI for the next generation.”

mytechnotalent

“What @modular is doing with Mojo and the MaxPlatform is a completely different ballgame.”

scrumtuous

“Mojo and the MAX Graph API are the surest bet for longterm multi-arch future-substrate NN compilation”

pagilgukey

“I'm very excited to see this coming together and what it represents, not just for MAX, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.”

strangemonad

“I tried MAX builds last night, impressive indeed. I couldn't believe what I was seeing... performance is insane.”

drdude81

“The more I benchmark, the more impressed I am with the MAX Engine.”

justin_76273

Get started with Mojo

View Documentation

One language, any hardware.
Pythonic syntax.
Systems-level performance.

Why we built Mojo

Why should I use Mojo ?

GPU Programming Made Easy

Bare metal performance on any GPU

Use Mojo to extend python

Build with us in the open to create the future of AI

Write GPU Kernels with MAX

Powering Breakthroughs in Production AI

World-Class Tools, Out of the Box

Mojo learns from

“Mojo has Python feel, systems speed. Clean syntax, blazing performance.”

Get started with Mojo

Popular Mojo Tech Talks

Developers love Mojo

Quick start resources

One language, any hardware. Pythonic syntax. Systems-level performance.

Why we built Mojo

Why should I use Mojo ?

GPU Programming Made Easy

Bare metal performance on any GPU

Use Mojo to extend python

Build with us in the open to create the future of AI

Write GPU Kernels with MAX

Powering Breakthroughs in Production AI

World-Class Tools, Out of the Box

Mojo learns from

“Mojo has Python feel, systems speed. Clean syntax, blazing performance.”

Get started with Mojo

Popular Mojo Tech Talks

Developers love Mojo

Quick start resources

One language, any hardware.
Pythonic syntax.
Systems-level performance.