Modal: Taming GPU Price Volatility with Linear Programming

2025-05-09
Modal: Taming GPU Price Volatility with Linear Programming

Modal tackles the volatile GPU market by employing a linear programming (LP) algorithm. Their resource solver system analyzes real-time demand, pricing, and availability to dynamically adjust GPU instance counts, ensuring optimal pricing and satisfying customer needs. Even with constraints like various GPU types, CPU, RAM, and regional limitations, the system allocates resources within seconds, leveraging price discrepancies to save millions annually. This guarantees fast scaling while employing heuristics and Google's robust GLOP solver for reliability and stability. Customers enjoy seamless scalability without the complexities of cloud resource management.

Read more
Tech

Maximizing GPU Utilization: From Allocation to FLOP/s

2025-05-07
Maximizing GPU Utilization: From Allocation to FLOP/s

This article delves into three levels of GPU utilization: GPU Allocation Utilization, GPU Kernel Utilization, and Model FLOP/s Utilization. The authors highlight the importance of maximizing GPU utilization given their high cost and performance sensitivity. The article analyzes factors affecting utilization at each level, such as economic limitations, DevOps limitations, and host overhead, and proposes optimization strategies like using the Modal platform for improved GPU allocation efficiency, optimizing kernel code, and increasing arithmetic intensity. Finally, the article shares the current state of GPU utilization in the industry and best practices, providing valuable experience and guidance for developers.

Read more
Development

DoppelBot: Your CEO, Now an LLM

2025-02-04
DoppelBot: Your CEO, Now an LLM

Modal has created DoppelBot, a Slack bot that can replace your CEO (sort of!). It fine-tunes an OpenLLaMa model on your team's Slack messages to mimic your CEO's communication style. Built on Modal's serverless platform, the entire process—scraping, fine-tuning, inference, and Slack event handling—is streamlined and efficient. The open-source code allows for easy deployment and customization within your workspace. Using LoRA for efficient fine-tuning and supporting multiple workspaces, DoppelBot offers a novel approach to team collaboration and productivity enhancement. The article details its functionality and deployment steps.

Read more
Development Slack Bot

GPU Glossary: A Comprehensive Guide to GPU Architecture

2025-01-14
GPU Glossary: A Comprehensive Guide to GPU Architecture

The Modal team has created a comprehensive GPU glossary to address the fragmented nature of GPU documentation. This interactive online dictionary connects concepts across different levels of the stack, from CUDA architecture to nvcc compiler flags. Users can navigate via hyperlinks or read linearly. The glossary covers device hardware (CUDA architecture, Streaming Multiprocessors, etc.), device software (CUDA programming model, PTX, etc.), and host software (CUDA C++, NVIDIA drivers, etc.), providing developers with a comprehensive and easily understandable resource for GPU knowledge.

Read more
Development