Building LLMs from Scratch: Vectors, Matrices, and High-Dimensional Spaces

2025-09-06
Building LLMs from Scratch: Vectors, Matrices, and High-Dimensional Spaces

This article, the second in a three-part series, demystifies the workings of Large Language Models (LLMs) for technically inclined readers with limited AI expertise. Building on part 19 of a series based on Sebastian Raschka's book "Build a Large Language Model (from Scratch)", it explains the use of vectors, matrices, and high-dimensional spaces (vocab space and embedding space) within LLMs. The author argues that understanding LLM inference requires only high-school level math, while training requires more advanced mathematics. The article details how vectors represent meaning in high-dimensional spaces and how matrix multiplication projects between these spaces, connecting this to linear layers in neural networks.

Read more

Building an LLM from Scratch: Unraveling the Mystery of Attention

2025-05-11
Building an LLM from Scratch: Unraveling the Mystery of Attention

This post delves into the inner workings of the self-attention mechanism in large language models. The author analyzes multi-head attention and layered mechanisms, explaining how seemingly simple matrix multiplications achieve complex functionality. The core idea is that individual attention heads are simple, but through multi-head attention and layering, complex and rich representations are built. This is analogous to how convolutional neural networks extract features layer by layer, ultimately achieving a deep understanding of the input sequence. Furthermore, the post explains how attention mechanisms solve the inherent fixed-length bottleneck problem of RNN models and uses examples to illustrate the roles of query, key, and value spaces in the attention mechanism.

Read more
AI

Building an LLM from Scratch: A Deep Dive into Dropout

2025-03-20
Building an LLM from Scratch: A Deep Dive into Dropout

This post documents the author's journey through the dropout chapter of Sebastian Raschka's "Build a Large Language Model (from Scratch)". Dropout is a regularization technique that prevents overfitting by randomly ignoring some neurons or weights during training, thus spreading knowledge more broadly across the model. The author details the implementation of dropout and explores nuances of its application in LLMs, such as applying dropout to attention weights or value vectors, and rebalancing the resulting matrix. The post also touches upon practical dropout rate choices and the challenges of handling higher-order tensors for batch processing, setting the stage for further learning.

Read more
Development

Building an LLM from Scratch: A Deep Dive into Self-Attention

2025-03-05
Building an LLM from Scratch: A Deep Dive into Self-Attention

This blog post, the eighth in a series documenting the author's journey through Sebastian Raschka's "Build a Large Language Model (from Scratch)", focuses on implementing self-attention with trainable weights. It begins by reviewing the steps involved in GPT-style decoder-only transformer LLMs, including token and positional embeddings, self-attention, normalization of attention scores, and context vector generation. The core of the post delves into scaled dot-product attention, explaining how trainable weight matrices project input embeddings into different spaces (query, key, value). Matrix multiplication is leveraged for efficient computation. The author provides a clear, mechanistic explanation of the process, concluding with a preview of upcoming topics: causal self-attention and multi-head attention.

Read more
AI

Is Blogging Still Relevant in the Age of AI?

2025-02-25
Is Blogging Still Relevant in the Age of AI?

This post explores the value of blogging in the era of ChatGPT and other AI tools. While AI can easily access and synthesize information, blogging remains valuable: it solidifies one's knowledge, helps others, builds a personal brand (though this is difficult to achieve solely through blogging), provides a sense of accomplishment, and builds a writing portfolio showcasing thinking skills, benefiting career development. The author argues that AI's main impact on blogging is reducing the sense of accomplishment from reader interaction, but this doesn't negate blogging's overall value.

Read more
Development

Blogger's Retrospective: The Value of Deep Dive Learning Posts

2025-02-24
Blogger's Retrospective: The Value of Deep Dive Learning Posts

A blogger, while recovering from a minor illness, reviewed their past blog posts and discovered that the most popular ones followed a pattern: learning a new skill and summarizing it into a tutorial. They believe this writing style not only solidifies their own knowledge but also helps others. These tutorial-style blog posts serve as the very resources the author wished existed when they were learning, offering readers clear and in-depth learning guides. Going forward, the author plans to create more of these "TIL deep dives", sharing their learning journey.

Read more
Misc