On May 10th, over 100 engineers and researchers from across the AI ecosystem gathered at AGI House in Hillsborough, CA for our very first hackathon: a fast-paced day of hacking, learning, and building with Mojo. The Modular GPU Kernel Hackathon brought together developers of all experience levels to experiment with Mojo on modern GPU hardware, collaborate in person, and accelerate the future of high-performance AI infrastructure.
Participants tackled a wide range of problems, from low-level kernel implementations to full model training frameworks. Many participants had no Mojo or GPU programming experience. By the end of the day, dozens of teams had working prototypes and new insights into what Mojo can do.
We're thrilled by what this community accomplished in just a single day!
Hackathon talks now available
Before the hacking began, we were lucky to hear from an all-star lineup of speakers:
- Chris Lattner, CEO of Modular, opened the event with a look at the challenges AI developers face today and how Mojo aims to solve them.
- Ramine Roane, Corporate VP of AI at AMD, shared how AMD is approaching AI infrastructure with the MI300X GPU and the ROCm software stack.
- Mark Saroufim, cofounder of GPU MODE and software engineer on the PyTorch team at Meta, broke down the tradeoffs of rewriting versus compiling PyTorch backends.
- Jeff Niu, member of technical staff at OpenAI, explored how he brought Triton-style abstractions into Mojo to prototype high-performance kernels.
- Simon Boehm and Sasha Krassovsky, members of technical staff at Anthropic, shared their real-world experience running inference across NVIDIA GPUs, Google TPUs, and AWS Tranium.
Winning projects
First place: Monolithic Sup

Team: Marcel Roed (PhD Student at Stanford University), Herman Brunborg (PhD Student at Stanford University), and Rajat Vadiraj Dwaraknath (PhD Student at Stanford University)
Marcel and the team tackled one of the most ambitious challenges of the hackathon. They built a training framework in Mojo/MAX and implemented the kernels and backward passes needed to train a Transformer model from scratch. This required implementing backpropagation in MAX, along with gradient descent algorithms like AdamW, as well as figuring out how to work with FlashAttention in the MAX kernel library and implementing its derivative to perform gradient training. Although they didn’t have time to fully implement FlashAttention during the hackathon and used standard scaled dot product attention instead, they’ve continued working on the project and plan to complete the implementation soon.
“It often makes sense to build something that works rather than trying to make something fast immediately… It was super fun to debug in real-time and discuss errors and solutions with the Modular employees as we ran into problems as we were trying to use Mojo in ways not previously explored by the Modular team.” - Herman
“It was difficult to get our kernels to be correct, and figuring out how to use the FlashAttention implementations... was quite challenging. But we showed that we can build useful SoTA-level tools from bare-bones with Mojo in a short period of time.” - Marcel
Second place: Fast Implementation of Prefix Scan Algorithms for AMD MI300X Using Decoupled Lookback

Team: Kirill Bobyrev (Software Engineer at Waymo)
Kirill set out to implement the optimal parallel prefix scan algorithm for high-performance GPUs—and in the process, helped improve Mojo’s standard library. His contributions included new, corrected implementations of warp- and block-level scans, and an efficient, tunable full device-wide scan. These updates are now part of Mojo’s open-source repo.
“GPU programming is notoriously challenging, but Mojo makes it surprisingly pleasant. The language feels modern, its templates and meta-programming features enable rapid experimentation, and the code is highly readable—akin to Rust's standard library, a clear contrast to the often cumbersome C++ standard library.” - Kirill
Even more impressively, Kirill had never written a line of CUDA or Mojo before the week of the hackathon.
Third place: Gaussian Splatting in Mojo

Team: Sandeep Menon (Software Engineer, Deep Learning at Kodiak) and Owen Leather (Perception Software Engineer at Kodiak)
Gaussian splatting is a rendering technique that, until now, had only been implemented in CUDA. Sandeep’s team aimed to break new ground by porting this kernel to Mojo so it could run on GPUs like AMD’s MI300X. While they weren’t able to fully finish the implementation by the end of the event, they made strong headway and are continuing to build on it.
“Learning Mojo and meeting the amazing Modular team was a highlight. It is humbling to code in the era of AI programming. I would love to be invited to future events and join many more hackathons as a way to kickstart learning on new topics.” - Sandeep
More hackathon projects

Beyond our winners, teams dove into a wide range of challenges using Mojo:
- A heat diffusion kernel based on the finite difference stencil method
- A GPU-accelerated BM25 ranking algorithm
- An optimized Non-Maximum Suppression implementation for YOLO-style models
- A benchmarking study to run Mojo kernels without barrier or synchronize calls
- A Fast Fourier Transform implementation in Mojo
- A matrix inversion and LU decomposition kernel for small matrices
- A dot product primitive and related linear algebra building blocks
Many participants wrote Mojo or GPU code for the first time and were eager to build on their projects in the coming weeks. We’re incredibly proud of everyone who participated in the hackathon. Whether you were building compilers, training frameworks, or GPU kernels from scratch, your work is what makes this community so exciting!
Thank you again to our sponsors, AMD, Crusoe, and GPU MODE, for making the event possible. And if you weren’t able to join us in person, be sure to catch the talk recordings on the Modular YouTube channel.
Until next time, keep building!