Building an LLM from Scratch: A Deep Dive into Dropout

2025-03-20
Building an LLM from Scratch: A Deep Dive into Dropout

This post documents the author's journey through the dropout chapter of Sebastian Raschka's "Build a Large Language Model (from Scratch)". Dropout is a regularization technique that prevents overfitting by randomly ignoring some neurons or weights during training, thus spreading knowledge more broadly across the model. The author details the implementation of dropout and explores nuances of its application in LLMs, such as applying dropout to attention weights or value vectors, and rebalancing the resulting matrix. The post also touches upon practical dropout rate choices and the challenges of handling higher-order tensors for batch processing, setting the stage for further learning.

Read more
Development

Building an LLM from Scratch: A Deep Dive into Self-Attention

2025-03-05
Building an LLM from Scratch: A Deep Dive into Self-Attention

This blog post, the eighth in a series documenting the author's journey through Sebastian Raschka's "Build a Large Language Model (from Scratch)", focuses on implementing self-attention with trainable weights. It begins by reviewing the steps involved in GPT-style decoder-only transformer LLMs, including token and positional embeddings, self-attention, normalization of attention scores, and context vector generation. The core of the post delves into scaled dot-product attention, explaining how trainable weight matrices project input embeddings into different spaces (query, key, value). Matrix multiplication is leveraged for efficient computation. The author provides a clear, mechanistic explanation of the process, concluding with a preview of upcoming topics: causal self-attention and multi-head attention.

Read more
AI

Is Blogging Still Relevant in the Age of AI?

2025-02-25
Is Blogging Still Relevant in the Age of AI?

This post explores the value of blogging in the era of ChatGPT and other AI tools. While AI can easily access and synthesize information, blogging remains valuable: it solidifies one's knowledge, helps others, builds a personal brand (though this is difficult to achieve solely through blogging), provides a sense of accomplishment, and builds a writing portfolio showcasing thinking skills, benefiting career development. The author argues that AI's main impact on blogging is reducing the sense of accomplishment from reader interaction, but this doesn't negate blogging's overall value.

Read more
Development

Blogger's Retrospective: The Value of Deep Dive Learning Posts

2025-02-24
Blogger's Retrospective: The Value of Deep Dive Learning Posts

A blogger, while recovering from a minor illness, reviewed their past blog posts and discovered that the most popular ones followed a pattern: learning a new skill and summarizing it into a tutorial. They believe this writing style not only solidifies their own knowledge but also helps others. These tutorial-style blog posts serve as the very resources the author wished existed when they were learning, offering readers clear and in-depth learning guides. Going forward, the author plans to create more of these "TIL deep dives", sharing their learning journey.

Read more
Misc