Gemma 4 just dropped on Modular, Day Zero! Read More →

Mistral Large 3 logo

Mistral Large 3: Flagship Vision-Language Inference on Modular

Mistral Large 3 is a 675B MoE model with 41B active parameters, supporting text and vision tasks.

Example Usage

Input
Python

  from openai import OpenAI
  
  client = OpenAI(
      base_url="https://model.api.modular.com",
      api_key="<your_api_token>",
  )
  
  response = client.chat.completions.create(
      model="mistralai/Mistral-Large-3-675B-Instruct-2512",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Explain speculative decoding in 3 sentences."},
      ],
      stream=True,
  )
  
  for chunk in response:
      if chunk.choices[0].delta.content:
          print(chunk.choices[0].delta.content, end="")
Output

  Speculative decoding uses a smaller draft model to predict multiple
  tokens ahead, then verifies them against the full model in a single
  pass. Accepted tokens skip individual generation steps, improving
  throughput without sacrificing accuracy. It's most effective when the
  draft model closely matches the target model's distribution.
Input
Cute orange cartoon fox with a small brown backpack sitting on grass in a bright, colorful forest.
Output

  “The image shows a fox wearing a backpack in a forest.”
  
  
  
  
Model Details
  • Developed by
    Mistral AI
  • Model family
    mistralai/Mistral-Large-3-675B-Instruct-2512
  • Modality
    LLM,
    Vision,
  • Context Window
    256K
  • Total Params
    675B
  • Precision
    FP8 / NVFP4
  • Deployment options
    Shared, Dedicated, Self-hosted

Why choose Mistral Large 3 on Modular?

  • High performance, out of the box

    Run leading open models with strong default performance and the ability to optimize down to the kernel — extracting more from every GPU.

  • Lower Infrastructure Costs

    Deploy efficiently across NVIDIA and AMD hardware to reduce GPU count, increase throughput, and avoid expensive closed-model licensing.

  • Easy Integration

    Integrate through an OpenAI-compatible endpoint, swap models freely, and scale across clouds or hardware without redesigning your application stack.

Mistral Large 3
Want to self-host this model with our open source infrastructure?
Read How

🔥 Trending models

Similar models

Get started with Modular

  • Request a demo

    Schedule a demo of Modular and explore a custom end-to-end deployment built around your models, hardware, and performance goals.

    • Distributed, large-scale online inference endpoints

    • Highest-performance to maximize ROI and latency

    • Deploy in Modular cloud or your cloud

    • View all features with a custom demo

    Book a demo

    Talk with our sales lead Jay!

    30min demo.  Evaluate with your workloads.  Ask us anything.

  • Talk to us!

    Book a demo for a personalized walkthrough of Modular in your environment. Learn how teams use it to simplify systems and tune performance at scale.

    • Custom 30 min walkthrough of our platform

    • Cover specific model or deployment needs

    • Flexible pricing to fit your specific needs

    Book a demo

    Talk with our sales lead Jay!

  • Start using MAX

    ( FREE )

    Run any open source model in 5 minutes, then benchmark it. Scale it to millions yourself (for free!).

  • Start using Mojo

    ( FREE )

    Install Mojo and get up and running in minutes. A simple install, familiar tooling, and clear docs make it easy to start writing code immediately.