
Gemma 4 26B A4B: Optimized MoE Inference on NVIDIA & AMD
Gemma 4 26B A4B is a Mixture-of-Experts (MoE)model with 26B total parameters but only 4B activated per forward pass, meaning you get the quality of a much larger model at a fraction of the compute cost. It also supports a 256K context window and is designed to fit the memory footprint of high-end servers.
- Developed byGoogle
- Model familygoogle/gemma-4-26B-A4B-it
- ModalityLLM,Vision,
- Context Window256K
- Total Params31.3B
- PrecisionFP4
- Deployment optionsShared, Dedicated, Self-hosted
Why choose Gemma 4 26B A4B on Modular?
Run leading open models with strong default performance and the ability to optimize down to the kernel — extracting more from every GPU.
Deploy efficiently across NVIDIA and AMD hardware to reduce GPU count, increase throughput, and avoid expensive closed-model licensing.
Integrate through an OpenAI-compatible endpoint, swap models freely, and scale across clouds or hardware without redesigning your application stack.
🔥 Trending models

MiniMax M3 is an open-weight, natively multimodal frontier model from MiniMax with ~428B total parameters and ~23B activated parameters. It combines frontier-level coding and agentic performance, an ultra-long context window of up to 1M tokens, and mixed-modality training across text, image, and video. It introduces MiniMax Sparse Attention (MSA) to make million-token context computationally viable, delivering up to 9x prefill and 15x decode speedups over M2 at 1M context
Similar models
Get started with Modular
Schedule a demo of Modular and explore a custom end-to-end deployment built around your models, hardware, and performance goals.
Distributed, large-scale online inference endpoints
Highest-performance to maximize ROI and latency
Deploy in Modular cloud or your cloud
View all features with a custom demo

Book a demo
Talk with our sales lead Jay!
30min demo. Evaluate with your workloads. Ask us anything.
Book a demo for a personalized walkthrough of Modular in your environment. Learn how teams use it to simplify systems and tune performance at scale.
Custom 30 min walkthrough of our platform
Cover specific model or deployment needs
Flexible pricing to fit your specific needs

Book a demo
Talk with our sales lead Jay!
Run any open source model in 5 minutes, then benchmark it. Scale it to millions yourself (for free!).
Install Mojo and get up and running in minutes. A simple install, familiar tooling, and clear docs make it easy to start writing code immediately.






