October 1, 2024

Hands-on with Mojo 24.5

Mojo 24.5 is here, and it's our biggest release yet. It's available now through the new Magic package manager, and is bundled with the MAX 24.5 release. Please install Magic to follow along.

This release includes several new core language changes and standard library enhancements. In this blog post, we will dive into many of these new features and language improvements using code examples. We focus our examples on the unified pointer type UnsafePointer, and gradually introduce other notable features. By the end of this blog post, you will have a thorough understanding of these enhancements and can comfortably utilize them in your Mojo code. 

One of the biggest highlights of this release is the significant contributions from our community. We received numerous pull requests from 11 community contributors that included new features, bug fixes, documentation enhancements, and code refactoring. Special thanks 🙏🏼 to our community contributors:

@jjvraw, @artemiogr97, @martinvuyk, @jayzhan211, @bgreni, @mzaks, @msaelices, @rd4com, @jiex-liu, @kszucs, @thatstoasty.

For a complete list of changes, please refer to the changelog for version 24.5.

All the code for this blog post is available in our GitHub repository.

Dramatically reduced the auto-imported modules

Mojo v24.5 has significantly reduced the set of automatically imported entities into users' programs. Previously, modules like memory, sys, os, utils, python, bit, random, math, builtin, and collections were automatically available. Now, only the explicitly enumerated entities in prelude/__init__.mojo are automatically imported. This change requires developers to explicitly import the entities they use, preventing unexpected namespace pollution and improving code clarity.

No need for var in fn

Mojo 24.5 has relaxed the requirement of using var within fn functions, making fn more similar to def functions, while still maintaining its unique capabilities. However, fn stills differs from def by allowing greater control over memory, which is particularly important when interfacing with low-level C code through Foreign Function Interface (FFI). Another difference is that def raises implicitly.

Unified pointer data structure via UnsafePointer

In 24.5, we’ve unified the pointer data structures by consolidating all previous pointer types into a single UnsafePointer. The previous types -– DTypePointer, LegacyPointer and Pointer— have been removed. Their functionalities have been incorporated into UnsafePointer, simplifying the pointer system while retaining all necessary capabilities. This unification streamlines pointer usage in Mojo and reduces confusion for developers.

The next section is a refresher on the UnsafePointer and how to properly use it. Please feel free to skip this section and go to the examples.

Refresher on UnsafePointer

Understanding Undefined Behavior (UB) with uninitialized memory

The UnsafePointer type creates an indirect reference to a location in memory. You can use an UnsafePointer to dynamically allocate and free memory, or to point to memory allocated by some other piece of code. You can use these pointers to write code that interacts with low-level interfaces, to interface with other programming languages, or to build certain kinds of data structures. But as the name suggests, they're inherently unsafe. For example, when using unsafe pointers, you're responsible for ensuring that memory gets allocated and freed correctly.

When we allocate memory using UnsafePointer in Mojo, the allocated memory is uninitialized. Accessing this uninitialized memory without proper initialization is considered Undefined Behavior (UB). This can occur when using the __getitem__ operation (i.e., square brackets []) on an uninitialized pointer:

Mojo
ptr = UnsafePointer[Int].alloc(1) ptr[0] = 42 <-- Undefined Behavior

The above example shows ptr[0] attempting to provide a valid reference, but since the memory is uninitialized, the validity and lifetime of the reference is ambiguous. Unlike C, which does not differentiate and has historically been plagued by memory vulnerabilities due to this, Mojo requires explicit initialization of unsafe pointers to ensure memory safety.

Proper Initialization with init_pointee_copy/move

To ensure that memory is properly initialized, Mojo provides methods like init_pointee_copy from the UnsafePointer, suitable for numeric types such as Int. This method allows for the safe initialization of memory which we should use to initialize ptr safely.

Mojo
ptr = UnsafePointer[Int].alloc(1) ptr.init_pointee_copy(42)

In the provided code snippet, initializing ptr with ptr[0] = 42 was identified as problematic due to it involving direct assignment to uninitialized memory. Using init_pointee_copy resolves this issue by ensuring that the memory is correctly and safely initialized before use.

You can read a more extensive discussion of how to allocate and initialize pointers in the UnsafePointer documentation.

With this primer, we will start using UnsafePointer and gradually introduce other notable features of Mojo 24.5, so that we can see end-to-end examples incorporating all the new features.

Example 1: UnsafeBuffer

The following is a way to implement a buffer data structure in Mojo using the UnsafePointer. Can you spot why this buffer is unsafe?

Mojo
from memory import memset_zero struct UnsafeBuffer: var data: UnsafePointer[UInt8] var size: Int fn __init__(inout self, size: Int): self.data = UnsafePointer[UInt8].alloc(size) memset_zero(self.data, size) self.size = size fn __del__(owned self): self.data.free()

If we run magic run mojo unsafe_buffer.mojo:

Mojo
def main(): ub = UnsafeBuffer(10) print("initial value at index 0:") print(ub.data[0]) ub.data[0] = 255 print("value at index 0 after getting set to 255:") print(ub.data[0])

We get the output:

Output
initial value at index 0: 0 value at index 0 after getting set to 255: 0

But why? What went wrong?

As stated in the safety partUnsafePointer does not carry any lifetime information, which means that the lifetime of data in UnsafeBuffer is not connected to the lifetime of the ub instance. In the last print(ub.data[0]), the compiler considers ub as unused immediately after the pointer to the element is calculated by ub.data[0], but before reading from that pointer. As a result, Mojo’s (ASAP) destructor gets activated, freeing the data pointer. Any subsequent call becomes Undefined Behavior (UB) because they are accessing freed memory. This is why, quite luckily, we see the unexpected output.

In the next example, we try to make our buffer implementation safe.

Example 2: SafeBuffer

The following implementation ties the lifetime of (private) _data pointer to an instance of a SafeBuffer via the write and read methods. Moreover, we are using debug_assert to check the bounds in write and read methods which gets activated using the compiler option -D MOJO_ENABLE_ASSERTIONS. We will shortly see how.

Mojo
from memory import memset_zero struct SafeBuffer: var _data: UnsafePointer[UInt8] var size: Int fn __init__(inout self, size: Int): debug_assert(size > 0, "size must be greater than zero") self._data = UnsafePointer[UInt8].alloc(size) memset_zero(self._data, size) self.size = size fn __del__(owned self): self._data.free() fn write(inout self, index: Int, value: UInt8): debug_assert(0 <= index < self.size, "index must be within the buffer") self._data[index] = value fn read(self, index: Int) -> UInt8: debug_assert(0 <= index < self.size, "index must be within the buffer") return self._data[index]

Now if we run the application code

Mojo
def main(): sb = SafeBuffer(10) sb.write(0, 255) print("value at index 0 after getting set to 255:") print(sb.read(0))

via the following command:

Bash
magic run mojo -D MOJO_ENABLE_ASSERTIONS safe_buffer.mojo

We see the output is as expected:

Output
value at index 0 after getting set to 255: 255

Note that it is crucial to not directly access the _data and instead use the write and read methods in order to tie the lifetime of the _data pointer to the underlying instance.

Named result bindings

Mojo now supports named result bindings. Named result bindings are useful for directly emplacing function results into the output slot of a function. This feature provides more flexibility and guarantees around emplacing the result of a function compared to a "guaranteed" named return value optimization.

To see how it can simplify our application code, let’s have a look at the following code continuing our SafeBuffer:

Mojo
struct SafeBuffer: ... @staticmethod fn initialize_with_value(size: Int, value: UInt8) -> Self as output: output = SafeBuffer(size) for i in range(size): output.write(i, value) return def main(): buffer = SafeBuffer.initialize_with_value(size=10, value=128)

In this code, we define a static method initialize_with_value that returns an instance of SafeBuffer. The method declares output as the return variable of type Self. Within the method, we initialize output by creating a new SafeBuffer of the specified size. We then fill the buffer by writing the given value to each index.

By using a named return variable output, the compiler constructs and modifies the return value directly in the memory location where it will ultimately reside. This eliminates the need for unnecessary copying or moving of the SafeBuffer instance which is important for types that are not movable and copyable. The as output syntax in the function signature indicates that output is the return variable, and the final return statement returns control without specifying a value, as output is already the return value.

Argument exclusivity verification

Mojo 24.5 now checks (at compile time) that mutable argument references don’t alias other references. That means that Mojo requires references (including implicit references due to borrowed/inout arguments) to be uniquely referenced (non-aliased) if mutable. This is important for code safety, because it allows the compiler (and readers of code) to understand where and when a value is mutated. It is also useful for performance optimization because it allows the compiler to know that accesses through immutable references cannot change behind the scenes. 

To see how argument exclusivity helps to write safe code and can catch errors at compile time, let’s have a look at the following process_buffers function that uses our SafeBuffer to overwrite a mutable buffer with the values of a borrowed buffer:

Mojo
fn process_buffers(buffer1: SafeBuffer, inout buffer2: SafeBuffer): debug_assert(buffer1.size == buffer2.size, "buffer sizes much match") for i in range(buffer1.size): buffer2.write(i, buffer1.read(i))

Now image that in our application code we have:

Mojo
def main(): buffer1 = SafeBuffer.initialize_with_value(size=10, value=128) buffer2 = SafeBuffer(10) process_buffers(buffer1, buffer1)

In our application code, we accidentally pass buffer1 as both arguments to process_buffers, with the second argument intended to be mutable (inout). This means that we're trying to mutate buffer1 while also having a borrowed reference to it. Mojo's argument exclusivity rules enforce that mutable references (inout parameters) must be unique and non-aliased. The compiler detects this violation at compile time and creates a warning for potential bugs and ensuring code safety by enforcing no-aliasing rules. This also allows the compiler to optimize more aggressively by treating argument references as not-aliased.

When running the code, the compiler warns us about such issues:

Output
warning: call argument allows writing a memory location previously readable through another aliased argument process_buffers(buffer1, buffer1) ^~~~~~~~ ~~~~~~~ note: 'buffer1' value is passed through aliasing 'inout' argument process_buffers(buffer1, buffer1) ^ ~~~~~~~

Formattable trait

As of Mojo 24.5, the print function now requires that its arguments conform to the Formattable trait. This enables efficient stream-based writing by default, avoiding unnecessary intermediate String heap allocations.

Let’s use this new trait and implement format_to and __str__ so that we can print our buffer values easily. Everything stays the same, except we include Formattable and Stringable as follows:

Mojo
struct SafeBuffer(Stringable, Formattable): ... fn __str__(self) -> String: return String.format_sequence(self) fn format_to(self, inout writer: Formatter): debug_assert(self.size > 0, "size must be greater than zero") writer.write("[") for i in range(self.size - 1): writer.write(self._data[i], ", ") writer.write(self._data[self.size - 1]) writer.write("]")

Now let’s test it within our application code via magic run mojo safe_buffer.mojo:

Mojo
def main(): buffer1 = SafeBuffer.initialize_with_value(size=10, value=128) buffer2 = SafeBuffer(10) process_buffers(buffer1, buffer2) print("buffer2:", buffer2)

Which produces the expected outputs:

Output
buffer2: [128, 128, 128, 128, 128, 128, 128, 128, 128, 128]

For the record, here is all the code for safe_buffer.mojo:

Mojo
from memory import memset_zero struct SafeBuffer(Stringable, Formattable): var _data: UnsafePointer[UInt8] var size: Int fn __init__(inout self, size: Int): debug_assert(size > 0, "size must be greater than zero") self._data = UnsafePointer[UInt8].alloc(size) memset_zero(self._data, size) self.size = size @staticmethod fn initialize_with_value(size: Int, value: UInt8) -> Self as output: output = SafeBuffer(size) for i in range(size): output.write(i, value) return fn __del__(owned self): self._data.free() fn write(inout self, index: Int, value: UInt8): debug_assert(0 <= index < self.size, "index must be within the buffer") self._data[index] = value fn read(self, index: Int) -> UInt8: debug_assert(0 <= index < self.size, "index must be within the buffer") return self._data[index] fn __str__(self) -> String: return String.format_sequence(self) fn format_to(self, inout writer: Formatter): debug_assert(self.size > 0, "size must be greater than zero") writer.write("[") for i in range(self.size - 1): writer.write(self._data[i], ", ") writer.write(self._data[self.size - 1]) writer.write("]") fn process_buffers(buffer1: SafeBuffer, inout buffer2: SafeBuffer): debug_assert(buffer1.size == buffer2.size, "buffer sizes much match") for i in range(buffer1.size): buffer2.write(i, buffer1.read(i)) def main(): sb = SafeBuffer(10) sb.write(0, 255) print("safe buffer outputs:") print(sb.read(0)) buffer1 = SafeBuffer.initialize_with_value(size=10, value=128) buffer2 = SafeBuffer(10) # process_buffers(buffer1, buffer1) # <-- argument exclusivity detects such errors at compile time process_buffers(buffer1, buffer2) print("buffer2:", buffer2)

Example 3: Generic SafeBuffer[T]

In this example, we aim to make our SafeBuffer implementation generic, allowing it to handle different data types. To achieve this, we use UnsafePointer with Optional[T] so that we can initialize our pointer with NoneType. By making SafeBuffer generic, we can create buffers for various data types while maintaining safety and proper initialization.

Mojo
from collections import Optional struct SafeBuffer[T: CollectionElement]: var data: UnsafePointer[Optional[T]] var size: Int fn __init__(inout self, size: Int): debug_assert(size > 0, "size must be greater than zero") self.data = UnsafePointer[Optional[T]].alloc(size) for i in range(size): (self.data + i).init_pointee_copy(NoneType()) self.size = size @staticmethod fn initialize_with_value(size: Int, value: T) -> Self as output: output = SafeBuffer[T](size) for i in range(size): output.write(i, value) return fn __copyinit__(inout self, existing: Self): self.data = existing.data self.size = existing.size fn __moveinit__(inout self, owned existing: Self): self.data = existing.data self.size = existing.size fn __del__(owned self): self.data.free() fn write(inout self, index: Int, value: Optional[T]): debug_assert(0 <= index < self.size, "index must be within the buffer") self._data[index] = value fn read(self, index: Int) -> Optional[T]: debug_assert(0 <= index < self.size, "index must be within the buffer") return self._data[index] fn take(inout self, index: Int) -> Optional[T] as output: output = self.read(index) self.write(index, Optional[T](None)) return

Conditional conformance

Mojo 24.5 allows types to “conditionally conform” to traits, which allows a generic struct to conform to a trait only when its type parameters meet certain conditions. In our case, we can make SafeBuffer[T: CollectionElement] conform to Stringable as long as T itself conforms to Stringable (i.e. T: Stringable). The key requirement for conditional conformance is that the trait used in the generic parameter must include the trait(s) used in the struct definition.

To achieve this, we use the StringableFormattableCollectionElement that is defined to include both Formattable and StringableCollectionElement traits:

Mojo
trait StringableFormattableCollectionElement(Formattable, StringableCollectionElement): ...

We then define the __str__ method with a type parameter U that must conform to StringableFormattableCollectionElement:

Mojo
fn __str__[U: StringableFormattableCollectionElement](self: SafeBuffer[U]) -> String: ...

Note that the self parameter is typed as SafeBuffer[U], repeating the type parameter U. This ensures that the __str__ method is only available when U conforms to the necessary traits, allowing the compiler to catch type errors at compile time if we attempt to use SafeBuffer with a type that does not meet the requirements.

And now we are ready to include such trait in __str__ and format_to by minimally adjusting the code as follows:

Mojo
struct SafeBuffer[T: CollectionElement]: ... fn __str__[U: StringableFormattableCollectionElement](self: SafeBuffer[U]) -> String: ret = String() writer = ret._unsafe_to_formatter() self.format_to(writer) _ = writer^ return ret^ fn format_to[ U: StringableFormattableCollectionElement ](self: SafeBuffer[U], inout writer: Formatter): debug_assert(self.size > 0, "size must be greater than zero") writer.write("[") for i in range(self.size - 1): if self._data[i]: writer.write(self._data[i].value(), ", ") else: writer.write("None", ", ") if self._data[self.size - 1]: writer.write(self._data[self.size - 1].value()) else: writer.write("None") writer.write("]")

To test how conditional conformance can catch type errors at compile time, let’s define a dummy type that is neither Stringable nor Formattable, yet still includes CollectionElement requirements:

Mojo
struct NotStringableNorFormattable(CollectionElement): fn __init__(inout self): ... fn __copyinit__(inout self, existing: Self): ... fn __moveinit__(inout self, owned existing: Self): ...

If we try to use it in our application code:

Mojo
def main(): buf = SafeBuffer[NotStringableNorFormattable](10) buf.__str__()

We get the following compile time error indicating that the type does not conform to the required trait StringableFormattableCollectionElement:

Output
error: invalid call to '__str__': could not deduce parameter 'U' of callee '__str__' buf.__str__() ~~~~~~~~~~~^~ note: failed to infer parameter 'U', argument type 'NotStringableNorFormattable' does not conform to trait 'StringableFormattableCollectionElement' buf.__str__() ^~~ note: function declared here fn __str__[U: StringableFormattableCollectionElement](self: SafeBuffer[U]) -> String: ^

mojo test

Mojo 24.5 also comes with the mojo test option that uses the Mojo compiler for running unit tests. Let’s add a test case under test_generic_safe_buffer.mojo with the following:

Mojo
from testing import assert_equal from generic_safe_buffer import SafeBuffer def test_buffer(): buffer = SafeBuffer[String].initialize_with_value(size=5, value=String("hi")) val = buffer.take(2).value() assert_equal(val, String("hi")) assert_equal(buffer.__str__(), "[hi, hi, None, hi, hi]")

Now we can run magic run mojo test test_generic_safe_buffer.mojo and can see the output as:

Output
Testing Time: 1.828s Total Discovered Tests: 1 Passed : 1 (100.00%) Failed : 0 (0.00%) Skipped: 0 (0.00%)

Magic tip: we can include the following in the tasks section of mojoproject.toml:

mojoproject.toml
[tasks] test = "mojo test test_*.mojo

So that running all the tests becomes as easy as running magic run test.

For completeness, the following includes the entire code for generic_safe_buffer.mojo which we can run via magic run mojo generic_safe_buffer.mojo:

Mojo
from collections import Optional trait StringableFormattableCollectionElement(Formattable, StringableCollectionElement): ... struct SafeBuffer[T: CollectionElement]: var _data: UnsafePointer[Optional[T]] var size: Int fn __init__(inout self, size: Int): debug_assert(size > 0, "size must be greater than zero") self._data = UnsafePointer[Optional[T]].alloc(size) for i in range(size): (self._data + i).init_pointee_copy(NoneType()) self.size = size @staticmethod fn initialize_with_value(size: Int, value: T) -> Self as output: output = SafeBuffer[T](size) for i in range(size): output.write(i, value) return fn __copyinit__(inout self, existing: Self): self._data = existing._data self.size = existing.size fn __moveinit__(inout self, owned existing: Self): self._data = existing._data self.size = existing.size fn __del__(owned self): self._data.free() fn write(inout self, index: Int, value: Optional[T]): debug_assert(0 <= index < self.size, "index must be within the buffer") self._data[index] = value fn read(self, index: Int) -> Optional[T]: debug_assert(0 <= index < self.size, "index must be within the buffer") return self._data[index] fn __str__[U: StringableFormattableCollectionElement](self: SafeBuffer[U]) -> String: ret = String() writer = ret._unsafe_to_formatter() self.format_to(writer) _ = writer^ return ret^ fn format_to[ U: StringableFormattableCollectionElement ](self: SafeBuffer[U], inout writer: Formatter): debug_assert(self.size > 0, "size must be greater than zero") writer.write("[") for i in range(self.size - 1): if self._data[i]: writer.write(self._data[i].value(), ", ") else: writer.write("None", ", ") if self._data[self.size - 1]: writer.write(self._data[self.size - 1].value()) else: writer.write("None") writer.write("]") fn take(inout self, index: Int) -> Optional[T] as output: output = self.read(index) self.write(index, Optional[T](None)) return fn process_buffers[T: CollectionElement](buffer1: SafeBuffer[T], inout buffer2: SafeBuffer[T]): debug_assert(buffer1.size == buffer2.size, "buffer sizes much match") for i in range(buffer1.size): buffer2.write(i, buffer1.read(i)) struct NotStringableNorFormattable(CollectionElement): fn __init__(inout self): ... fn __copyinit__(inout self, existing: Self): ... fn __moveinit__(inout self, owned existing: Self): ... def main(): buffer1 = SafeBuffer[UInt8].initialize_with_value(size=10, value=UInt8(128)) buffer2 = SafeBuffer[UInt8](size=10) # process_buffers(buffer1, buffer1) # <-- argument exclusivity detects such errors at compile time process_buffers(buffer1, buffer2) # testing conditional conformance print(buffer2.__str__()) print(buffer2.take(0).value()) print(buffer2.__str__()) sbuffer1 = SafeBuffer[String].initialize_with_value(size=10, value=String("hi")) print(sbuffer1.take(5).value()) print(sbuffer1.__str__()) ## uncomment to see the compiler error: # buf = SafeBuffer[NotStringableNorFormattable](10) # buf.__str__()

Summary

In this blog post, we explored several new features introduced in Mojo 24.5, including the unified UnsafePointer type, relaxed variable declaration in fn functions, named result bindings, argument exclusivity, and conditional conformance.

We began by discussing the dramatic reduction of auto-imported modules, which improves code clarity by requiring explicit imports. We then dived into the unification of pointer data structures into UnsafePointer, simplifying pointer usage while retaining essential functionalities.

We examined how to properly use UnsafePointer, highlighting the importance of initializing allocated memory to avoid Undefined Behavior. Through the UnsafeBuffer example, we demonstrated the potential pitfalls of accessing UnsafePointer directly.

By introducing the SafeBuffer example, we showcased how to use lifetimes of the underlying UnsafePointer can be tied to an instance of SafeBuffer through the write and read methods, preventing premature deallocation and Undefined Behavior.

We explored the benefits of named result bindings showing how it allows for efficient construction and returning of objects without unnecessary copying or moving.

We discussed argument exclusivity, emphasizing how Mojo's compile-time checks prevent aliasing of mutable references, enhancing code safety and correctness.

We incorporated the Formattable trait to enable efficient and flexible string formatting of our SafeBuffer, and demonstrated how to implement the format_to and __str__ methods.

Finally, we extended our SafeBuffer implementation to be generic, leveraging conditional conformance to ensure that methods like __str__ are only available when the type parameter meets certain trait constraints. This allows the compiler to enforce type safety and catch errors at compile time.

Overall, Mojo 24.5 brings significant enhancements that improve code safety, performance, and developer experience. We encourage you to explore these new features and consider how they can benefit your own projects. For more details, please check out the changelog for Mojo 24.5.

What’s next?

Now that you've learned about the latest features in Mojo 24.5, it's time to put this knowledge into practice and dive deeper into the Mojo ecosystem. Here are some resources and next steps to help you get started and stay connected:

Until next time! 🔥

Ehsan M. Kermani
,
AI DevRel

Ehsan M. Kermani

AI DevRel

Ehsan is a Seasoned Machine Learning Engineer with a decade of experience and a rich background in Mathematics and Computer Science. His expertise lies in the development of cutting-edge Machine Learning and Deep Learning systems ranging from Natural Language Processing, Computer Vision, Generative AI and LLMs, Time Series Forecasting and Anomaly Detection while ensuring proper MLOps practices are in-place. Beyond his technical skills, he is very passionate about demystifying complex concepts by creating high-quality and engaging content. His goal is to empower and inspire the developer community through clear, accessible communication and innovative problem-solving. Ehsan lives in Vancouver, Canada.