Modular acquires BentoML to deliver more production AI in the cloud!  - Read more

DeepSeek R1 Distill Llama 70B

DeepSeek R1 Distill Llama 70B is a dense 70B parameter distilled reasoning model.

Example Usage

Input
Python

  from openai import OpenAI
  
  client = OpenAI(
      base_url="https://api.modular.com",
      api_key="<your_api_token>",
  )
  
  response = client.chat.completions.create(
      model="modelname",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Who won the world series in 2020?"},
      ],
      stream=True,
  )
  
  for chunk in response:
      if chunk.choices[0].delta.content:
          print(chunk.choices[0].delta.content, end="")
Output

  The Los Angeles Dodgers won the 2020 World Series.
  
  What else can I help you with?
  
  
Model Details
  • Developed by
    DeepSeek
  • Model family
    DeepSeek
  • Modality
    LLM,
  • Context Window
    128K
  • Total Params
    70B
  • Precision
    BF16
  • Deployment options
    Shared, Dedicated, Self-hosted

Why choose DeepSeek R1 Distill Llama 70B on Modular?

  • High performance, out of the box

    Run leading open models with strong default performance and the ability to optimize down to the kernel — extracting more from every GPU.

  • Lower Infrastructure Costs

    Deploy efficiently across NVIDIA and AMD hardware to reduce GPU count, increase throughput, and avoid expensive closed-model licensing.

  • Easy Integration

    Integrate through an OpenAI-compatible endpoint, swap models freely, and scale across clouds or hardware without redesigning your application stack.

DeepSeek R1 Distill Llama 70B
Want to self-host this model with our open source infrastructure?
Read How

🔥 Trending models

Similar models

Get started with Modular

  • Request a demo

    Schedule a demo of Modular and explore a custom end-to-end deployment built around your models, hardware, and performance goals.

    • Distributed, large-scale online inference endpoints

    • Highest-performance to maximize ROI and latency

    • Deploy in Modular cloud or your cloud

    • View all features with a custom demo

    Book a demo

    Talk with our sales lead Jay!

    30min demo.  Evaluate with your workloads.  Ask us anything.

  • Talk to us!

    Book a demo for a personalized walkthrough of Modular in your environment. Learn how teams use it to simplify systems and tune performance at scale.

    • Custom 30 min walkthrough of our platform

    • Cover specific model or deployment needs

    • Flexible pricing to fit your specific needs

    Book a demo

    Talk with our sales lead Jay!

  • Start using MAX

    ( FREE )

    Run any open source model in 5 minutes, then benchmark it. Scale it to millions yourself (for free!).

  • Start using Mojo

    ( FREE )

    Install Mojo and get up and running in minutes. A simple install, familiar tooling, and clear docs make it easy to start writing code immediately.