MAX Engine is a next-generation compiler and runtime library for running AI inference. With support for PyTorch (TorchScript), ONNX, and native Mojo models, it delivers low-latency, high-throughput inference on a wide range of hardware to accelerate your entire AI workload. As highlighted in the recent MAX version 24.3 release, the MAX platform enables users to fully leverage the capabilities of the MAX Engine by creating bespoke inference models using the MAX Graph APIs. The Graph API offers a low-level programming interface for constructing high-performance symbolic computation graphs in Mojo. This interface provides a uniform representation of symbolic values and a suite of operators that process these symbols to construct the entire graph.

In this blog post, we guide you step-by-step how to use the MAX Graph API. In a nutshell, working with MAX Graph API involves three main steps:

- Building and verifying the graph.
- Creating an inference session and compiling the graph.
- Executing the graph with input(s) and retrieving the output(s).

We begin by creating two straightforward graphs for addition and matrix multiplication in Mojo, demonstrating how to compile and execute these graphs. Then we proceed to implement a two-layer feedforward neural network with ReLU activation for inference on MNIST data, comparing the accuracy to a PyTorch implementation. Additionally, we implement ReLU6 as a custom operator and use the MAX Graph Custom Operator API to substitute ReLU and ensuring the accuracy aligns with the PyTorch model.

To install MAX, please check out Get started with MAX Engine. If you are also new to Mojo, you can start with the Mojo Manual. To get involved and ask questions, you can join our Discord community and contribute to discussions on the Mojo and MAX GitHub. Should you encounter any issues, we recommend checking the roadmap and known issues first.

The code for this tutorial is in our GitHub repository. The MAX version for this tutorial is *max 24.3.0 (9882e19d)*.

We also have a video walkthrough of all the code featured in this blog post below.

### Hello, world!

To begin familiarizing ourselves with the Graph API, we start by constructing a simple addition graph. We will verify and compile this graph, and then proceed to execute it as demonstrated below.

#### Addition graph

Below is a straightforward graph that takes two inputs; *input0* and *input1*. It adds these inputs together and produces *output0* as the output.

##### Step 1: Build the graph

To construct the addition graph, we start by importing the necessary modules. We then instantiate the Graph by specifying two input types of fixed static dimension *1 *(we will later see other types of supported dimensions such as symbolic dimension). Next, we create a symbolic representation of the addition with the expression *out = graph[0] + graph[1]*. Here *graph[0]* refers to the first input *input0* and *graph[1]* to *input1*. This operation adds two inputs together. Finally, we designate out as the output of the graph by calling *graph.output(out)*.

We can print the graph to visually confirm its structure. The output should show the following representation where *rmo* and *mo* are **Modular’s internal intermediate representations**

This line corresponds to the symbolic addition operation *out = graph[0] + graph[1]*.

The subsequent line

indicates that *%0* has been set as the output of the graph, aligning with the *graph.output(out)* in our code.

The complete graph representation looks like this:

To programmatically verify the complete graph construction, we use the *graph.verify()* method. This checks for various structural integrity criteria such as ensuring there are no cycles within the graph (acyclicity) which would indicate recursion or feedback loops that can not be part of the dataflow graph. For more details, check out the official documentation on the verify method.

##### Step 2: Create inference session, load and compile the graph

With our graph now verified and ready, the next step involves creating an inference session instance, loading the graph into this *session* and compiling the graph into a model instance. We also print the input names to use when executing the model.

which outputs

Verifying input names* input0* and *input1* is crucial for correctly executing the model in the subsequent section.

##### Step 3: Execute the graph/model with inputs

To execute the graph, we first create two input tensors in Mojo, specifying their names and values in the execute method. The result from the execution are returned as TensorMap, from which we can retrieve the value of *output0* via the get method as follows

The outputs are printed as follows

Now, let’s explore our second example.

#### Matmul graph

In this example, we create a graph specifically for performing matrix multiplication (*matmul*) by a constant symbol which we will use further along in the next section. This type of graph is particularly important as it demonstrates how constant symbols, representing trained and fixed weights in a neural network, can be utilized. This concept will be expanded upon in subsequent sections.

The setup for this *matmul* graph follows the same foundational steps as our initial example but includes some critical additions:

- We introduce a symbolic dimension
*m*to represent*m x 2* - The use
*graph.constant*to create a constant symbol, crucial for maintaining static values

Here's how we compile and execute the graph to accommodate varying input tensor sizes at runtime:

Here are the results of *matmul* graph using a constant symbol of* 2 x 2 *tensor and a random input tensors of shapes *2 x 2* or *3 x 2* for demonstration

With this foundation, we are ready to explore more advanced applications in the next section of the tutorial.

### Inference with MAX Graph API

In this section, we demonstrate how to build a two-layer neural network with ReLU activation using PyTorch, train it on the famous MNIST data featuring black and white *28 x 28* pixel images of handwritten digits (*0* to *9* i.e. total of *10* classes) and then test its accuracy.

Subsequently, we will implement the same model using the MAX Graph API for inference to ensure the accuracy remains consistent.

#### Train, test and save a model on MNIST using PyTorch

First, to set up, let’s define our neural network in PyTorch:

We can train and test the network as follows (*python mnist.py*)

After training and testing the network, we found the model achieves an accuracy of *96.99%* on the test dataset.

Next, we implement the PyTorch model in MAX Graph API for inference.

#### Define inference graph with MAX graph API in Mojo

After training our model and saving its weights, we need to construct an inference graph and load the weights as constant symbols. Our graph will handle input dimensions with a symbolic *"batch"* dimension and static *28x28* spatial dimensions, representing flattened and preprocessed images. We will also include a softmax operation via ops.softmax to compute probabilities directly within the inference graph.

With the inference graph defined, we can now execute it with test images.

#### Run inference and check accuracy

To execute the graph, we first convert the model weights from numpy format to Mojo tensor format, then create the graph, compile it, and run inference. Finally, to check the accuracy, we iterate on test images, preprocess them, obtain the result and calls argmax to find the predicted value between the *10* classes and count how many of them correctly match the ground truth label.

The output of *mojo mnist.mojo* is

This matches the accuracy we observed from the PyTorch test, confirming that our MAX Graph API implementation performs equivalently.

### MAX Graph custom operator API

In this final section of our tutorial, we demonstrate how to create and register a custom operator to use inside a MAX graph. Following our previous two layer neural network, we first train our model with *ReLU6* activation via *python mnist.py —-use-relu6* which replaces *ReLU* with *ReLU6*, checks the test accuracy and saves the model weights that were done before.

#### Creating custom operator in Mojo

To create a custom operator in Mojo, we should follow these steps

- Create a dedicated sub-repository and name it custom_ops
- Create a
*__init__.mojo*with the import content from*.relu6 import relu6*

Create a custom op Mojo file, *relu6.mojo* with the following code

Above code uses *@register.op(“relu6”)* decorator to register the wrapped *relu6* function with name *”relu6”*, as a custom operator. The wrapped function can only take max.extensibility tensors and must have only one output of the same type and **can not** *raise* an *Error*. We create an empty_tensor to store the output.

To obtain the output, we create a function wrapped in *@parameter* to be applied on each element of the input tensor via for_each. Such function (*_relu6*) loads SIMD values of each rank and applies the *ReLU6* formula *math.min(math.max(0, val), 6)*. Finally, we move the output via *output^* to correctly transfer ownership of the result tensor.

#### Using Custom Operator for Inference

Once we have the custom operator defined, we need to package it as *.mojopkg* via *mojo package custom_ops*.

In our graph definition, we are now ready to replace the *ops.relu *with our custom one

with

Here we use the ops.custom that takes the custom operator name *”relu6”* as parameter and the *fc1* as input and the output type *fc1.type()*. The rest of the code stays the same.

The last change is to let the inference *session* know about the custom operator at runtime via

#### Final verification

As the final check, we train and test the model that uses *ReLU6* via *python mnist.py —-use-relu6* which outputs

Then we run the inference code via *mojo mnist.mojo —-use-relu6* which shows

The matching accuracy between the PyTorch version and the Mojo implementation confirms the effective integration of the custom operator.

### Deploying the MAX Graph binary

For deployment, we can build the *mnist* binary via *mojo build mnist.mojo*. To execute the binary, since we use the Mojo-Python interop, we should make sure to set the *MOJO_PYTHON_LIBRARY* as follows

### Next Steps

Here are a few potential steps for you

- Explore other neural network architectures beyond a simple two-layer feedforward network and implement them using MAX Graph API
- Experiment with other custom operator
- Test and assess correctness and contribute to the community 🚀

### Conclusion

In this blog post, we demonstrated how to use MAX Graph API step-by-step, to create a symbolic graph, compile and execute such graphs. We also showed how to replicate a two layer neural network trained in PyTorch, in MAX Graph API and saw that the test accuracy remained intact. We concluded by showing how to create and register a custom operator to use for inference. To verify correctness, we showed the test accuracy also remained intact when using such a custom operator. We hope that by the end of this blog post, you have gained a better understanding of the inner workings of MAX Graph APIs.

Additional resources:

- Get started with downloading MAX
- Download and run MAX examples on GitHub
- Head over to the MAX docs to learn more about MAX Engine APIs and Mojo programming manual
- Join our Discord community
- Contribute to discussions on the Mojo and MAX GitHub

Report feedback, including issues on our Mojo and MAX GitHub tracker.

Until next time!🔥