Modular: The future of AI depends on Modularity

Platforms like TensorFlow, PyTorch, and CUDA do not focus on modularity - there, we said it! They are sprawling technologies with thousands of evolving interdependent pieces that have grown organically into complicated structures over time. AI software developers must deal with this sprawl while deploying workloads to server, mobile devices, microcontrollers, and web browsers using multiple hardware platforms and accelerators.

The existing monolithic systems mentioned above are not easily extensible or generalizable outside of their initial domain target, which has forced hardware makers to build their own technology stacks. The consequence is a hugely fragmented AI deployment industry with dozens of toolchains that carry different tradeoffs and limitations. More importantly, these design patterns have also slowed the pace of innovation by being less usable, less portable, and harder to scale.

We have seen and contributed to AI projects that touch the human and natural world in profound ways - whether to save the Great Barrier Reef, help people find their rhythm, or teaching people to dance. Unfortunately we’ve also seen that the technical complexity of building and deploying these applications is still too high. Deploying AI remains the domain of full stack experts, and cutting edge applications are only accessible to people at the biggest tech companies that built the ML technologies themselves.

Imagine a world where ML research truly flows rapidly and effectively into production from a large global community. One where these breakthroughs are more accessible to everyone, allowing product innovators to drastically improve our daily lives and be freed from the chains of software and hardware complexity. In this world, AI would be more usable, more portable, more accessible, more scalable, and would enable a much bigger community to massively impact our health, the environment, recreation, finance, manufacturing, and commerce, among many other industries.

Are we doomed? Do the real-world complexities of AI today make it impossible to solve this?

We don’t think so. The software industry goes through cycles, and we’ve seen and solved problems like this before.

Flash back to software in the 1990’s

The software world in the 1990’s had fragmentation problems similar to what we see today with AI. At the time, C and C++ had established communities, but were fragmented across dozens of proprietary compilers. Each had vendor extensions, rough edges, strange corner cases, and incomplete implementations. It was so difficult to build cross-platform software that tools sprung up to help developers cope with the fragmentation, making it easier to install, build, and configure software (e.g., autoconf).

We were saved by the rise of GCC, which became massively successful throughout the 90’s by virtue of its cross-platform support, good performance, and stability, and by being free. GCC’s success drove a wave of consolidation in the tools industry, and the resulting defragmentation enabled a wave of new innovations by making its capabilities the de-facto standard. It catalyzed a revolution both in software (directly contributing to the rapid rise of OSS communities like Linux) and hardware (enabling innovation in instruction set architectures and new business models) by freeing the former from fragmented C/C++ implementations, and the latter from having to chase rapidly evolving C/C++ language standards.

**Monolithic systems are difficult to evolve**
Source: Wikipedia

While the computing world owes a debt of gratitude to GCC, it had some architectural challenges. GCC followed the classical parser, optimizer, code generator architecture used by all modern compilers, but it was intentionally designed as a monolithic batch compiler system, and GCC’s leadership resisted attempts to improve modularity and design, a continued source of friction in the community.

The rise of Modularity

It took time for the world to notice, but the year 2000 was a seminal moment in compilers and programming languages: it was the start of the LLVM project.

LLVM was created because the compilers of the time were difficult to extend and use as platforms for compiler research.

Twenty-two years later, LLVM/Clang powers much of the world's computation - it is now widely adopted across iOS, Android, Google, Meta, and many other companies and systems. However, one might wonder how this happened: LLVM/Clang uses the standard “parser, optimizer, and code generator” approach (similar to its predecessors), it doesn’t have breakthrough code generation algorithms, and other systems eventually followed the early innovations like “whole program optimization”. LLVM succeeded despite merely being “on par” with existing compilers on traditional C/C++ workloads.

The innovative aspect of LLVM is its software architecture: LLVM is designed as a collection of composable libraries. These libraries have defined interfaces that allow them to be composed and extended in innovative ways. They may be built into large-scale software projects or remixed for very small applications (e.g., domain-specific Just-In-Time compilers). Modularity and clear interfaces encourage testability, which contributes to higher quality implementations. Modularity and separation of concerns allow domain experts to work on large-scale projects without knowing how the whole system works.

**LLVM Project on Github**
Source: Github

In a 2011 retrospective, modularity was cited as enabling the LLVM community to scale, leading to new developer tooling like clang-format, and unlocking innovative programming languages (Rust, Julia, Swift and more). These were technically possible before, but they actually happened because of the usability and hackability of LLVM. Modularity design enabled next-generation Just-In-Time accelerator programming models like OpenCL and CUDA. This drove the next wave of consolidation in compiler technology, which is why LLVM now underlies most CPU, GPU, and AI systems today.

The most exciting contribution of LLVM is that it unlocked new use-cases that weren’t planned from the beginning. AI wasn’t part of its original design, nor was it designed for Snowflake to use in their database query optimizer. The dual to this observation is that many of these use cases would never have happened without LLVM (or something like it) available: a database team isn’t likely to build a JIT compiler for query optimization if they have to start by creating an X86 code generator from scratch.

AI Infrastructure in 2022

Today we can see great strides in the AI industry. For example, we have data scientists around the world training models on 100+ PetaFLOP supercomputers from a Jupyter notebook. That said, end-to-end deployment of those models is still far from being “democratized.” The tools used to deploy AI models today are strikingly similar to compilers and tools in the 1990’s and 2000’s. We see severe fragmentation across these systems, with a wide variety of hardware, each having bespoke tools.

The world’s biggest tech companies have built multiple in-house toolchains specific to different hardware products over the years, and these are often incompatible and share little code. How many flaky converters and translators does one industry really need?

Many in the industry believe that these issues are due to the inherent nature of AI, but we know this is because AI infrastructure is still in its infancy. AI is already having such an incredible impact on the world, but we have to wonder: How much bigger could the impact of ML be if we had the opportunity to rebuild it the right way?

The world deserves Modular AI

**The world of AI needs production quality, composable infrastructure**
Source: The New York Public Library

We have learned so much from the development of AI infrastructure and tools over the last ten years. The industry has made great strides, and much that was once research is now well understood. It is now time to incorporate the lessons learned into a single layered and composable system that integrates the best-known technologies from across the industry.

The next-generation ML system needs to be production-quality and meet developers where they are.

It must not require an expensive rewrite, re-architecting, or re-basing of user code. It must be natively multi-framework, multi-cloud, and multi-hardware. It needs to combine the best performance and efficiency with the best usability. This is the only way to reduce fragmentation and unlock the next generation of hardware, data, and algorithmic innovations.

This is a huge and important task. Achieving this requires extraordinary collaboration across a team of architects, engineers, and leaders who built many of the existing systems, and who are driving the state of the art forward. It requires focus, discipline, and a commitment to technical excellence - a value system that incentivizes building things the right way. It requires the strength to say “not yet” to many interesting projects, allowing us to ensure we get the fundamentals right.

Our goal is to create a world where AI is more useful, more accessible, more portable, more scalable and enables developers everywhere to positively impact the world in untold ways. In this world, more time is spent on using AI to solve problems, rather than wrestling with a fragmented set of low-quality tools and infrastructure. This is the world we seek to create.

We are building the future of AI and our vision is to enable AI to be used by anyone, anywhere.

Welcome to Modular.

Flash back to software in the 1990’s

The rise of Modularity

AI Infrastructure in 2022

The world deserves Modular AI

Row-major vs. column-major matrices: a performance analysis in Mojo and NumPy