June 17, 2024

What’s new in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements

Mojo 24.4 is now available for download, and this release includes several core language and standard library enhancements. In this blog post, we’ll dive deep into many of these features using code examples. One of the biggest highlights of this release is that we received 214 pull requests from 18 community contributors for new product features, bug fixes, documentation enhancements, and code refactoring. These contributions resulted in 30 net new features in the standard library, accounting for 11% of all improvements in this release. We’re incredibly proud of the momentum we’re seeing with community contributions, and it goes without saying – you are the real star of this release. On behalf of the entire Mojo team, we’d like to thank you for all your contributions to making Mojo awesome!

Throughout the rest of the blog post, we’ll discuss many of the new features in this release with code examples that you can find in a Jupyter Notebook on GitHub. As always, the official changelog has an exhaustive list of new features, what’s changed, what’s removed, and what’s fixed. Before we continue, don’t forget to upgrade your Mojo🔥. Let’s dive into the new features.

Improved Collections: New List and Dict features

In Mojo 24.4 List and Dict introduce several new features to make them even more Pythonic. Many of these features have come directly from our community:

  • List has new index(), count(), __contains__() and conforms to Stringable trait, thanks to contributions by @gabrieldemarmiesse, and @rd4com 
  • Dict has new clear(), reversed(), get(key), items(), values() and conforms to Stringable trait, thanks to contributions by @jayzhan211, @gabrieldemarmiesse, and @martinvuyk

Let’s take a look at how to use these through code examples.

Enhancements to List

In this example, we’ll calculate word frequency from the contents of a webpage and plot the results as a word cloud. In the code below, we use Mojo's Python interoperability feature to preprocess the text, tokenize it, and remove stop words. We also use the Python interoperability features to plot the results. For the purpose of demonstration, we’ll use the url = "https://docs.modular.com/mojo/manual/basics" to generate the word cloud from the Mojo manual. Feel free to experiment with different URLs. Now, let’s take a look at the code.

Mojo
from python import Python Python.add_to_path(".") utils = Python.import_module("utils") # Sample URL (you can replace this with any URL of your choice) url = "https://docs.modular.com/mojo/manual/basics" # Fetch and preprocess the text filtered_words = utils.fetch_and_preprocess_text(url) var mojo_word_list = List[String]() for fw in filtered_words: mojo_word_list.append(fw) var unique_words = List[String]() var word_frequencies = List[Int]() for word in mojo_word_list: if word[] not in unique_words: unique_words.append(word[]) word_frequencies.append(mojo_word_list.count(word[])) else: var index = unique_words.index(word[]) word_frequencies[index] = mojo_word_list.count(word[]) py_unique_words = Python.list() py_word_frequencies = Python.list() for i in range(len(unique_words)): py_unique_words.append(unique_words[i]) py_word_frequencies.append(word_frequencies[i]) utils.plot_word_cloud(py_unique_words, py_word_frequencies)

Output:

In the example above, first we fetch a list of filtered words from the webpage in url using utils.fetch_and_preprocess_text(url). After that we’re ready to calculate word frequencies. 

We highlight the use of __contains__() in the code word[] not in unique_words: where the condition checks whether the current word is already in the unique_words list. If the word is not in unique_words, it means this is the first time we are encountering this word. If the word is not already in unique_words, we append it to the unique_words list. We then count how many times this word appears in the original words list using the count() method. mojo_word_list.count(word) counts the number of occurrences of word[] in mojo_word_list. The frequency count of the word is appended to the word_frequencies list at the same index as the word in unique_words. This means that unique_words and word_frequencies will have corresponding elements. 

To plot the word cloud, each word is plotted at a random position with a size proportional to its frequency, and random colors are used for the words. We use the Python interoperability features to call the plotting function in utils.plot_word_cloud()

Enhancements to Dict

In this example we’ll use the Monte Carlo method for approximating the approximate value of Pi using new Dict features. The Monte Carlo method approximates pi by randomly generating points within a unit square and counting the number of points that fall within the unit circle inscribed within the square. The ratio of the points inside the circle to the total points can be used to approximate. We’ve written about this in more detail in an earlier blog post. Be sure to read that for more details on the math behind why this works. In this demo below we use Dict to methods exclusively instead of Arrays to implement the solution.

Mojo
from python import Python from random import random_float64 Python.add_to_path(".") utils = Python.import_module("utils") # Function to approximate pi using Monte Carlo method def approximate_pi(num_points: Int) -> (Float64, Dict[String, List[List[Float64]]]): var inside_circle: Int = 0 points = Dict[String, List[List[Float64]]]() keys = List[String]("inside", "outside") points = Dict.fromkeys(keys, List[List[Float64]]()) # Using fromkeys to initialize dictionary for _ in range(num_points): x = random_float64(-1, 1) y = random_float64(-1, 1) if x**2 + y**2 <= 1.0: inside_circle += 1 points["inside"].append(List(x, y)) else: points["outside"].append(List(x, y)) pi_approx = 4 * inside_circle / num_points return pi_approx, points # Number of points to generate num_points = 10000 # Approximate pi and get points data pi_approximation, points_data = approximate_pi(num_points) # Display the dictionary items print("Points inside and outside the circle:") for kv in points_data.items(): print(kv[].key,":",len(kv[].value)," points") # Use the get method to retrieve points data for "inside" points inside_points = points_data.get("inside") if inside_points: print("\nNumber of points inside the circle:", len(inside_points.take())) # Reverse the order of the dictionary keys and print them print("\nReversed keys of points data:") for key in reversed(points_data): print(key[]) py_points_data = Python.dict() py_val_list = Python.list() py_xy_list = Python.list() for kv in points_data.items(): for kv_val in kv[].value: for kv_xy in kv_val[]: py_xy_list.append(kv_xy[]) py_val_list.append(py_xy_list) py_xy_list=[] py_points_data[kv[].key] = py_val_list py_val_list=[] utils.plot_points(py_points_data, pi_approximation) # Clear the dictionary points_data.clear() print("\nPoints data dictionary after clearing:") for key in points_data: print(key[])

Output:

In the example above, we first create a Points dictionary, from new fromkeys static method to initialize Dict with keys from the List variable keys and set values to be empty lists. We use points_data.items() to print each items’ key and value. We also use reversed() to print the reversed order of keys. We also use get(key) to retrieve the points data for a specific key ("inside" in this case) and since the output of get() is an Optional type we use the take() function to retrieve the value. Finally we also demonstrate the use of clear to clear the dictionary. To plot the image above we made use of a utility function in Python using Python interoperability, called utils.plot_points()

New traits: Absable, Powable, Representable, Indexer

Mojo 24.4 also includes new traits to make writing Math equations simpler, print string representation of objects and define containers that can be indexed using integral values.

  • Structs that conform to Absable and Powable traits will work with built in abs() and pow() functions. Absable types must implement __abs__() dunder method and Powable type must implement the __pow__() dunder method. Powable types can also be used with operator ** in addition to the pow() function.
  • Objects of structs that conform to the Representable trait must implement a __repr__() dunder method, which enables the repr() to be called on objects to provide a string that can, if possible, be used to recreate the object and can be very useful for debugging. Thanks to @gabrieldemarmiesse for contributing this feature.
  • Structs that conform to the Indexer trait allow their objects to be used as index variables that can be passed to __getitem__() and __setitem__(). Types conforming to the Indexer trait implement __index__() dunder method and are implicitly convertible to Int or by calling the built in function index().

In the example below I’ve implemented a struct called MojoArray that conforms to Absable, Powable, and Representable traits, therefore it implements __abs__(), __pow__() and __repr__() dunder methods. Here is the skeleton of our struct, the full implementation is available on GitHub.

Mojo
struct MojoArray[dtype: DType = DType.float64](Absable, Powable, Representable): ... fn __abs__(self)->Self: ... fn __pow__(self, exp: Self)->Self: ... fn __repr__(self)->String:

In the code example above, we compute vectorized __abs__() and __pow__() as follows:

Mojo
fn __abs__(self)->Self: var new_array = Self(self.numel) @parameter fn wrapper[simd_width:Int,rank:Int=1](idx: StaticIntTuple[rank]): new_array._ptr.store[width=simd_width](idx[0], abs(self._ptr.load[width=simd_width](idx[0]))) elementwise[wrapper, simdwidthof[dtype](), 1](self.numel) return new_array fn __pow__(self, exp: Self)->Self: var new_array = Self(self.numel) @parameter fn wrapper[simd_width:Int,rank:Int=1](idx: StaticIntTuple[rank]): new_array._ptr.store[width=simd_width](idx[0], self._ptr.load[width=simd_width](idx[0])**exp[0]) elementwise[wrapper, simdwidthof[dtype](), 1](self.numel) return new_array

And __repr__() as follows:

Mojo
fn __repr__(self)->String: var s:String = "MojoArray(" for i in range(self.numel): s+=str(self[i]) if i != self.numel-1: s+="," return s + ")"

Now, let’s create an object of the MojoArray struct to see these methods in action.

Mojo
v1 = MojoArray.randn(5) v2 = MojoArray.randn(5) exp = MojoArray(2,2,2,2,2) print("__repr__ output:") print(repr(v1)) print("\nDifference array:") print(v1-v2) print("\nabs of difference array") print(abs(v1-v2)) print("\nabs of difference array raised to exp") print(abs(v1-v2)**exp)

Output:

Output
__repr__ output: MojoArray(6.7687160540239288,11.124524422703082,-1.9429505117095691,7.5089635577024039,4.3696328963036475) Difference array: [-1.6371641143074642 14.349008521304423 -3.5144861431982495 10.003881958131833 1.9896616488954946] abs of difference array [1.6371641143074642 14.349008521304423 3.5144861431982495 10.003881958131833 1.9896616488954946] abs of difference array raised to exp [2.6803063371761437 205.89404554446696 12.351612850732506 100.0776542322356 3.9587534770855384]

As you can see writing abs(v1-v2)**exp is a very expressive way to write math equations with these new traits vs. something like (v1-v2).abs().pow(exp) which is what we’d have done previously.

Mojo 24.4 also includes a few other math specific traits math.Ceilable, math.CeilDivable, math.CeilDivableRaising, math.Floorable, and Truncable. See the changelog for more details.

os module enhancements

Mojo 24.4 also includes several file IO enhancements that makes Mojo standard library’s os module more Pythonic. Particularly, this release introduces the following functions: mkdir(), rmdir(), os.path.getsize(), os.path.join() and a new tempfile module that implements gettempdir() and mkdtemp() functions, thanks to contributions from @artemiogr97. Let’s take a look at an example to see how to use these methods in your own projects.

Mojo
import os import tempfile # Create a directory dir_name = "example_dir" os.mkdir(dir_name) print("Directory", dir_name, "created.") # Create a temporary directory temp_dir = tempfile.mkdtemp() print("Temporary directory created at", temp_dir) # Create a file in the temporary directory temp_file_path = os.path.join(temp_dir, "temp_file.txt") with open(temp_file_path, "w") as temp_file: temp_file.write(str("This is a temporary file.")) print("File created at", temp_file_path) # Get the size of the file file_size = os.path.getsize(temp_file_path) print("Size of the file", temp_file_path, "is", file_size, "bytes.") # Get the system temporary directory system_temp_dir = tempfile.gettempdir() print("System temporary directory is", system_temp_dir.take()) # Remove the temporary directory and its contents os.remove(temp_file_path) os.rmdir(temp_dir) print("Temporary directory", temp_dir, "and its contents removed.") # Remove the created directory os.rmdir(dir_name) print("Directory", dir_name, "removed.")

Output:

Output
Directory example_dir created. Temporary directory created at /var/folders/4v/pt9r67795239mt40l5kzlkr40000gn/T/tmpmcoo96dw File created at /var/folders/4v/pt9r67795239mt40l5kzlkr40000gn/T/tmpmcoo96dw/temp_file.txt Size of the file /var/folders/4v/pt9r67795239mt40l5kzlkr40000gn/T/tmpmcoo96dw/temp_file.txt is 25 bytes. System temporary directory is /var/folders/4v/pt9r67795239mt40l5kzlkr40000gn/T/ Temporary directory /var/folders/4v/pt9r67795239mt40l5kzlkr40000gn/T/tmpmcoo96dw and its contents removed. Directory example_dir removed.

base64 package enhancements

This release also includes a new base64 package that offers encoding and decoding support for both the Base64 and Base16 encoding schemes. Base64 encoding is often used in tokenizers for large language models (LLMs) and we use it in our implementations of bpe and tiktoken tokenizer utilities for Llama3. Check out our Llama3 example in this repository. Let’s take a look at an example that show you how to use this new package in the Mojo standard library.

Mojo
import base64 from python import Python Python.add_to_path(".") utils = Python.import_module("utils") original_image = "mojo_fire.png" decoded_image = "decoded_mojo_fire.jpg" # Read the image file in binary mode with open(original_image, 'rb') as image_file: # Encode the image to Base64 encoded_string = base64.b64encode(image_file.read()) # Print the Base64 string print(encoded_string) # Decode the Base64 string back to binary data decoded_image_data = base64.b64decode(encoded_string) # Write the binary data to a new image file with open(decoded_image, 'wb') as image_file: image_file.write(decoded_image_data) print("Image decoded and saved as decoded_example.jpg") utils.plot_original_decoded_images(original_image, decoded_image)

Output:

You can see that the original and decoded images plotted side by side are the same. We use the utils.plot_original_decoded_images() helper function written in Python and called from Mojo using the Python interoperability feature.

Core language features

Mojo 24.4 also includes several core language features that are a bit harder to demonstrate with code examples. Most notably, this release Mojo has updated how def function arguments are handled. Previously, arguments were copied by default (owned convention), making them mutable but potentially causing performance issues due to unnecessary copies. Now, arguments use borrowed convention by default and only copied if mutated within the function. 

In this release you can also return multiple values from a function as a Tuple that can be unpacked into individual variables. For example in the earlier Enhancements to Dict section we define a method: def approximate_pi(num_points: Int) -> (Float64, Dict[String, List[List[Float64]]]). We call this method in this way to get the values of the Tuple into separate variables: pi_approximation, points_data = approximate_pi(num_points), where as previously we had to get the value as a Tuple variable and index into it to extract the values.

This release also introduces the new @parameter loop decorator which can be used with for loops where the loop variable is a compile time constant. This allows the Mojo compiler to perform a full unroll of the loop to improve performance. With the introduction of @parameter decorator for loops, the previously recommended @unroll decorator has now been deprecated.

There are many more core language enhancements in this release, see the changelog for a complete list.

New documentation pages

We also updated our documentation to include dedicated pages that dive deeper into the following topics:

But wait, there’s more!

Mojo 24.4 includes many more features that we didn’t cover in this blog post. Check out the changelog for a detailed list of what’s new, changed, moved, renamed, and fixed in this release. 

MAX 24.4 is also available for download today and for the first time we’re making it available on Mac OS. This release of MAX also includes several enhancements including New Quantization API for MAX Graphs, and full implementation of Llama 2 and Llama 3 models using Graph API with quantization. Read more in the MAX 24.4 announcement blog post.

All the examples I used in this blog post are available in a Jupyter Notebook on GitHub, check it out!

Until next time! 🔥

PRODUCT

Shashank Prasanna
,
AI Developer Advocate