LTX-2.3 NVFP4 Audio-Video Generation on Modular

LTX-2.3 NVFP4 is a quantized 22B DiT-based audio-video diffusion model from Lightricks. It generates synchronized video and audio, with support for text, image, video, and audio-conditioned generation workflows.

Deploy dedicated endpoint

Try in Playground

Example Usage

Audio output

0:00

Code to use

Python


  from openai import OpenAI
  
  client = OpenAI(
      base_url="https://model.api.modular.com",
      api_key="<your_api_token>",
  )
  
  response = client.audio.speech.create(
      model="Lightricks/LTX-2.3-nvfp4",
      voice="alloy",
      input="Welcome to Modular Cloud. What can I help you with?"
  )
  
  response.stream_to_file("output.mp3")

Get your API token

Video output

Code to use

Python


  import base64
  from openai import OpenAI
  
  client = OpenAI(
      base_url="https://model.api.modular.com",
      api_key="<your_api_token>",
  )
  
  prompt = """A 2D pop art comic book animation. Thick black ink outlines,
  flat color fills, Ben-Day halftone dots covering every surface,
  limited palette of pink, cyan, red, and black. A rainy back alley
  in Shinjuku at night, rendered as a comic panel: kanji neon signs
  drawn as flat graphic shapes with halftone glow, puddles drawn as
  flat cyan shapes with white highlight marks, rain shown as diagonal
  ink lines. A helmeted astronaut figure in a black raincoat walks
  away from the viewer down the alley, never turning. Steam from a
  ramen vent drawn as cartoon curls. The viewpoint pushes slowly
  forward through the alley. Hand-drawn cel animation feel, no
  photorealism, no 3D shading, no lens effects."""
  
  response = client.responses.create(
      model="Lightricks/LTX-2.3-nvfp4",
      input=prompt,
      extra_body={
          "provider_options": {
              "video": {
                  "height": 512,
                  "width": 512,
                  "steps": 28,
                  "num_frames": 81,
                  "frames_per_second": 16,
                  "response_format": "b64_json",
              }
          }
      },
  )
  
  video_data = response.output[0].content[0].video_data
  
  with open("output.mp4", "wb") as f:
      f.write(base64.b64decode(video_data))

Get your API token

Model Details

Developed by
Lightricks
Model family
Lightricks/LTX-2.3-nvfp4
Modality
Video,
Audio,
Total Params
22B
Precision
NVFP4
Deployment options
Shared, Dedicated, Self-hosted

Why choose LTX-2.3 on Modular?

High performance, out of the box
Run leading open models with strong default performance and the ability to optimize down to the kernel — extracting more from every GPU.
Lower Infrastructure Costs
Deploy efficiently across NVIDIA and AMD hardware to reduce GPU count, increase throughput, and avoid expensive closed-model licensing.
Easy Integration
Integrate through an OpenAI-compatible endpoint, swap models freely, and scale across clouds or hardware without redesigning your application stack.

Want to self-host this model with our open source infrastructure?

Read How

🔥 Trending models

DeepSeek V4 Pro

DeepSeek V4 Pro is a 1.6T MoE model with 49B active parameters and a 1M context window, featuring hybrid attention for efficient long-context inference.

LLM

FLUX.2 Klein 9B

FLUX.2 [klein] 9B is a 9 billion parameter rectified flow transformer capable of generating images from text descriptions and supports multi-reference editing capabilities.

Image

GLM-5.2

GLM-5.2 is Zhipu AI's newest open-weights model, optimized for coding, agentic workloads, and sustained execution of ultra-long-horizon tasks. Built on the GLM-5.1 Mixture-of-Experts architecture (754B total parameters, ~40B active) and expanded to a 1M-token context window, it is designed for long-running agent tasks, large coding workloads, and long-context understanding.

LLM

Kimi K2.7 Code

Kimi K2.7 Code is Moonshot AI's coding-focused agentic model, built on the Kimi K2.6 architecture. It shares the same 1T-parameter Mixture-of-Experts design (32B active per token, 384 experts, MLA attention) with a MoonViT vision encoder and a 256K-token context window. K2.7 Code delivers substantial gains on real-world long-horizon software engineering tasks while reducing thinking-token usage by approximately 30% compared with K2.6. Thinking and preserve_thinking are always enabled for consistent reasoning across multi-turn agentic sessions.

LLM

Vision

Similar models

Wan 2.2 T2V A14B

Wan 2.2 T2V A14B is a text-to-video diffusion model from Wan AI. It uses a Mixture-of-Experts architecture with 14B active parameters per denoising step to generate 5-second videos from text prompts at 480P or 720P.

Video

Qwen3-Omni-30B-A3B

Qwen3-Omni-30B-A3B by Alibaba is a 30B omni-modal MoE model with 3B active parameters, supporting text, audio, vision, and video.

LLM

Audio

Vision

GLM-4.7

GLM-4.7 by Zhipu AI is a 355B MoE model with 32B active parameters, supporting text, vision, and audio with reasoning.

LLM

Vision

Audio

Wan 2.2 TI2V 5B

Wan 2.2 TI2V 5B is a compact text-and-image-to-video diffusion model from Wan AI. It supports both text-to-video and image-to-video generation at 720P and 24fps with a high-compression Wan 2.2 VAE.

Video

View all models

Get started with Modular

Request a demo
Schedule a demo of Modular and explore a custom end-to-end deployment built around your models, hardware, and performance goals.
- Distributed, large-scale online inference endpoints
- Highest-performance to maximize ROI and latency
- Deploy in Modular cloud or your cloud
- View all features with a custom demo
Book a demo
Talk with our sales lead Jay!
30min demo. Evaluate with your workloads. Ask us anything.

Talk to us!
Book a demo for a personalized walkthrough of Modular in your environment. Learn how teams use it to simplify systems and tune performance at scale.
- Custom 30 min walkthrough of our platform
- Cover specific model or deployment needs
- Flexible pricing to fit your specific needs
Book a demo
Talk with our sales lead Jay!
Start using MAX
( FREE )
Run any open source model in 5 minutes, then benchmark it. Scale it to millions yourself (for free!).
Install MAX
What is MAX?
Start using Mojo
( FREE )
Install Mojo and get up and running in minutes. A simple install, familiar tooling, and clear docs make it easy to start writing code immediately.
Install Mojo🔥
What is Mojo🔥?

LTX-2.3 NVFP4 Audio-Video Generation on Modular

Example Usage

Why choose LTX-2.3 on Modular?

🔥 Trending models

Similar models

Get started with Modular

Start using Mojo