Introducing Foundry

Orchestration of Computation

Our mission is to orchestrate the world’s compute capacity, rendering it universally accessible and useful. Today, we’re excited to announce $80 million in Seed and Series A funding to build towards this future.

During my tenure at DeepMind, Google surpassed a critical threshold: it now spends more money on compute than it does on people. OpenAI and other leading AI companies are far beyond this point. This represents a civilizational-scale shift, reflecting that compute economics matters more than ever.

The rise of LLMs that can be instructed to perform general tasks, such as ChatGPT, and other AI breakthroughs have radically expanded the demand for high-performance GPUs and accelerated computing. It is clear that we are living in a new era in which computing power is not only valuable, but essential. Compute is now the most critical asset for leading companies and cutting-edge research, across domains. As we look ahead, our contention is that its importance will only increase, rendering it humanity’s highest-leverage resource.

In early 2022, we began to notice fissures in the AI industry emerging as a consequence of the much-discussed GPU shortage. While a supply bottleneck is part of the challenge, it’s not the only issue. Arguably, the industry suffers vastly more from under-utilization than from under-supply1.

To help address this, in late 2022, we began building Foundry to ensure humanity maximizes the utility of the computing power we already have, and will produce. Today, we’re excited to announce that we’ve made key technical breakthroughs, helped industry partners scale their compute cost-efficiently, and formed long-term strategic partnerships solidified via $80M in seed and Series A funding.

We’re grateful to count some of our industry’s leading technical and commercial minds among our backers, including Sequoia Capital, Lightspeed Venture Partners, Redpoint, Microsoft Ventures (M12), Conviction, NEA (Pete Sonsini), Jeff Dean (Chief Scientist at Google), Eric Schmidt (former CEO of Google), George Roberts (co-founder of KKR), Paul Milgrom (Nobel-prize winner and mechanism design pioneer), Matei Zaharia (UC Berkeley Professor and co-founder of Databricks), Jure Leskovec (Stanford Professor and Graph Neural Networks pioneer), Alexandr Wang (CEO of Scale AI), Liam Fedus (OpenAI, creator of ChatGPT), Lachy Groom (Co-Founder, Physical Intelligence), Mario Gabriele (The Generalist), David Vélez (CEO of Nubank), and more. Sincere thanks also to Jensen, Katie, Vishal and our close partners at NVIDIA.

Making compute as easy as turning on the light

Foundry exists to address the “root-node problem” of infrastructure at the heart of AI, rendering compute more accessible at scale. We’re building a new breed of public cloud, powered by an orchestration platform that makes accessing AI compute resources as easy as turning on the light.

As demand for GPUs and accelerated compute has increased, so has cost. Those fortunate enough to get access to necessary computing power have had to pay steep prices. The end result of computing scarcity is a tax on progress. Serving clients, meeting research deadlines, and managing training runs have all become slower and costlier.

While the much-discussed GPU shortage is part of the challenge, it’s not the only issue. Arguably, the industry suffers vastly more from under-utilization than from under-supply.2 This problem is the consequence of complexity.

The invention of the lightbulb was a significant juncture in the history of technology, but many further innovations and layers of abstraction were required to reach the simple light switch modality we know today. Consumers do not have to concern themselves with George Ohm, the structure of the Grid, or electricity market mechanism design questions…they just flip the switch. The analog of this UX in the AI compute ecosystem would solve a “root-node problem” and open the floodgates for the accelerated advancement of this technical supercycle.

One main challenge is that the current public cloud infrastructure, both in terms of software and hardware, was built to make serving and scaling web applications easy. While its design was well-suited to the previous technological revolution, it has not translated well to the frontier of AI/ML. Since existing cloud infrastructure was built for the web and not with modern Machine Learning workloads in mind, CPUs are the core components. The associated software stack doesn’t account for the SLO shape, workload structure3, infra heterogeneity, cluster multi-tenancy, fault-tolerance and resiliency, or placement (node-to-node, data-, and user- locality) requirements of ML work (blog post unpacking this in more depth upcoming). In many ways, today’s engineers and researchers are back in a position reminiscent of consumers in the era when the lightbulb was first invented, lacking the layers of abstraction that allow them to just flip the proverbial switch.

To use AI infrastructure today, CTOs and infra teams have become virtual supply chain managers. They have to get a "PhD" in hardware and spend their time dealing with capacity planning, pondering which chips to use for which workload, how to access them, and how to leverage them. The result is a complex, convoluted workflow that leeches engineering attention away from actually building products. Our goal is to make leveraging compute as simple as turning on the light.

Foundry’s philosophy and approach

We believe that the challenges of modern AI infrastructure cannot be solved with incremental solutions. Maximizing this high-leverage resource requires a fundamental reimagining from the ground up. To achieve this, Foundry has assembled a small but ferocious cross-disciplinary team, with expertise spanning AI/ML, distributed systems design, hardware, traditional software, product, finance, and business. The team has experiences from DeepMind’s Core Deep Learning team, Microsoft, OpenAI, and Meta’s infrastructure teams, Stanford’s Future Data Systems Group, and X. Our goal is to build not only the strongest technical architecture, but also one that is highly responsive to the practical and commercial needs that arise for practitioners aiming to cross the chasm into production, or push the limits of scale and performance.

To help address the economic and technical root-node challenges at the heart of AI and deep learning, Foundry offers GPU instances, at scale, with the best price-performance metrics on the market. As a result of our novel structural and technical approach, we’re often able to provide computing power at an order of magnitude lower costs than our users could access otherwise. Our work with early customers ranging from enterprises like KKR and LG to top research institutions like MIT and Stanford has taught us a great deal, as have our partnerships with leading AI startups (including those via our cluster partnerships with Lightspeed and Pear VC).

To address the core challenges of our users – from Fortune 1000s to seed-stage startups – we’ve built a suite of features designed to serve the most demanding use cases. Across Foundry’s products, we prioritize these 5 core levers:

Availability. We provide instances with top-tier accelerators, including high-performance datacenter GPUs such as the Nvidia H100s, as well as a suite of other smaller variants that can offer optimal price-performance depending on users’ particular tasks and SLOs. We’ve instantiated optimal configurations4 for both training and inference workloads, so that practitioners can achieve results swiftly and seamlessly.

Elasticity. We’re built to adapt to practitioners’ dynamic demands. Need a sudden burst in GPU capacity? Experiencing an unforeseen usage spike? Foundry is flexible. Users can scale up and down as needed.

Price-performance. You don’t need a dedicated gigawatt-scale fusion power plant to power a lightbulb. Similarly, top-tier GPUs aren’t always needed and certainly don’t always correspond to the highest ROI (as hyperbolic as this metaphor is, we see pretty extreme patterns quite often). Foundry’s orchestration maps workloads to devices that deliver maximum ROI per unit of compute spend5. For workloads that require speed (wall clock time), we’ve got your back. For workloads with flexible SLAs that can run in the background or outside of peak hours, we’ll ensure you don’t pay more than you should.

Simplicity. To get the most out of their compute resources, industry giants have invested for years in teams that build sophisticated cluster management and workload orchestration tools, abstracting away the complexity between ML work, and backing infrastructure and scheduling concerns. Foundry renders this capability accessible to everyone else, ensuring that users can reap compute leverage without dedicated infra teams at the scale of what you’ll find at a lavishly funded industry lab.

Security and resiliency. We recognize that security and reliability are paramount for serious practitioners. Foundry has been built to the highest security and compliance standards from day zero and is SOC 2 Type II certified. We work with enterprises of all sizes and invest heavily in technology to predict and resolve errors proactively, maximizing uptime. Just as it would be difficult to build websites if developers had to occupy themselves with lower OSI-level concerns like packet loss, buffer overflow, and congestion control, AI practitioners’ efforts are often stymied by the non-negligible failure rates of even high-end systems. We recognize that their infra is often our partners’ top investment, and we take seriously our responsibility as a trusted infrastructure partner.

Looking ahead

The partnerships solidified by this funding are helping Foundry continue to scale our operations, enhance our product suite, and form further strategic alliances. Our ultimate goal is to build a definitive AI/ML infrastructure platform, helping civilization make the most of our highest-leverage resource. You can reach us here if you’d like to become a customer or partner.

Join us

It’s still day one. We'd love to hear from you if you’re motivated by impact and passionate about wrangling rich technical problems. Explore our career opportunities or contact us at careers@mlfoundry.com. Stay tuned for more updates, and follow us on LinkedIn for the latest news.

Thanks again to our backers, customers, and early partners. For anyone reading this, we are excited to work with you along the journey ahead.

Jared Quincy Davis (Founder and CEO) & the Foundry Team

Thanks to Matei Zaharia, Deepak Narayanan, and Elad Gil for reading drafts of this.

1: More on this in a future blog post.
2: Blog post on this upcoming.
3: A workload’s parallel vs. series DAG structure, ensuing communication graph, etc have implications on parallelism strategy selection, which, as a function of SLOs, has major implications for scheduling.
4: AI accelerators selection, but also CPU, networking, RAM, and storage configuration and pairing.
5: Dedicated blog post on this upcoming.