Apple and NVIDIA Partner to Accelerate LLM Text Generation

2024-12-18

Apple and NVIDIA have teamed up to integrate Apple's ReDrafter technology into NVIDIA's TensorRT-LLM, resulting in a significant speedup for large language model text generation. ReDrafter combines beam search and dynamic tree attention, achieving significantly faster text generation without sacrificing quality. This collaboration allows developers using NVIDIA GPUs to easily leverage ReDrafter's accelerated token generation for their production LLM applications, achieving a 2.7x speed increase in benchmark tests, reducing latency and power consumption.

AI