Webtagr - Technology News Summarizer

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Baseten Achieves SOTA Performance on GPT-OSS-120B: A Race Against Time

2025-08-07

As a launch partner for OpenAI's new open-source LLM, Baseten raced to optimize GPT-OSS-120B for peak performance on launch day. They leveraged their flexible inference stack, testing across TensorRT-LLM, vLLM, and SGLang, supporting both Hopper and Blackwell GPU architectures. Key optimizations included KV cache-aware routing and speculative decoding with Eagle. Prioritizing latency, they chose Tensor Parallelism and utilized the TensorRT-LLM MoE backend. The team rapidly addressed compatibility issues and continuously refined model configuration, contributing back to the open-source community. Future improvements will include speculative decoding for even faster inference.

(www.baseten.co)

Development Inference Optimization