Book Review: Build a Large Language Model (From Scratch)
February 2, 2025•405 words
I've just finished reading Building a Large Language Model (From Scratch) by Sebastian Raschka. Overall, I highly recommend the book to anyone interested in an introduction to / a refresher on large language models.
The "From Scratch" part in the title is actually what drew my attention in the first place. Nowadays, so many online tutorials on language models are too application-focused (e.g. how to download and use open-source models, how to use API's, etc.). I am not saying they aren't useful, but I was looking for something more fundamental.
As an analogy, a first-year statistics course might only teach you how to use software packages to fit GLMs, but a more senior course would teach you exactly what mathematical formulas and algorithms are working behind the software. Without the latter, one could hardly be called a statistician (not that I would be an "expert" in language model after finishing the book).
Back to the book review: it does exactly what's promised in the title - building a language model from the ground up. It starts with an overview of the latest developments in language models (think OpenAI and ChatGPT), then an introduction to text data processing and word embeddings. The attention mechanism and the transformer architecture are the most technical parts of the book, but they are presented in a very accessible and intuitive manner, using an abundance of examples and code snippets. The book ends with a larger example to reproduce the GPT2 model (which is open-source) and language model finetuning.
I enjoyed the chapter on the attention mechanism the most. I also spent a good few days on it, because I had to actually take out a pen and paper to follow along. I actually took the Deep Learning Specialization on Coursera back in 2021, which also talks about sequence models (e.g. the transformer architecture). Back in 2021, I wasn't grasping some of the key ideas, but this time around, it's much easier and intuitive because I've been working with language models for a few weeks. It's very satisfying to at least get a bit more insight into the black box that is the large language model.
I am very glad that I took the time and effort to read this book. Again, I recommend it if you want to dig deeper in language models, or maybe just get a rough idea of what all the hype is about around language models.