Meta's Byte Latent Transformer (BLT): Outperforming Tokenization-Based LLMs

2024-12-14

Meta AI researchers introduced the Byte Latent Transformer (BLT), a novel large language model architecture that processes bytes directly, rather than tokens. BLT dynamically allocates computational resources based on byte entropy, resulting in significant improvements in inference efficiency and robustness compared to tokenization-based models. Scaling experiments up to 8 billion parameters and 4 terabytes of training data demonstrate BLT's ability to match the performance of token-based LLMs while offering enhanced reasoning capabilities and handling of long-tail data. This research showcases the feasibility of training large-scale models directly on raw bytes without a fixed vocabulary.

Read more

Meta FAIR Unveils Breakthrough AI Research, Open-Sourcing Key Models

2024-12-13

Meta FAIR released a suite of groundbreaking AI research artifacts, including Meta Motivo, a foundational model for controlling virtual embodied agents, and Meta Video Seal, an open-source model for video watermarking. This release focuses on advancements in agent capabilities, robustness, safety, and architectural innovations for more efficient learning. Other key contributions include the Flow Matching codebase, Meta Explore Theory-of-Mind for theory-of-mind reasoning, Large Concept Models (LCMs), and the Dynamic Byte Latent Transformer. By open-sourcing these tools and models, Meta aims to foster collaboration and accelerate responsible AI development.

Read more
AI