Beam Search

January 29, 2025•296 words

Suppose we have a distribution over a sequence of words, and we want to generate a sentence from it.

It can be very challenging to generate the sequence that has the highest probability overall - the number of possible sequences actually grows exponentially relative to the length of the sequence.

One simple idea is to use the Greedy Algorithm where we always aim for the local optimum. In this context, the next word is chosen based on the highest probability. We continue until we reach the end-of-sentence token or a pre-specified maximum length of sequence.

This greedy search can be very fast, but it doesn't always produce the best sequence. In fact, it could be stuck at a rather bad local optimum.

Beam Search can be viewed as a generalization of the above. When generating the next word, instead of choosing only the word with the highest probability, we first try out all possibilities, and keep those with the top K (e.g. 5) probabilities. This process is iterated until reaching the same stopping criteria.

Beam search is also a greedy-type algorithm, but we are allowing it to try out more paths and potentially explore a wider space. Similar to the best-first approach, it is not guaranteed to reach the global optimum. However, it is shown in practice to generate better results.

This website offers a concrete implementation of beam search in the context of decoding from a language model. I enjoyed reading it along with the Wikipedia article.

I tried to make my own implementation after reading the principles, but unfortunately wasn't able to do it within half an hour (arguably a rather arbitrary timeframe). Besides conceptual understanding, it seems like I need a bit more practice in coding some basic algorithms from the ground up.

👍❤️🫶👏👌🤯🤔😂😍😭😢😡😮

Subscribe to the author

You'll only receive email when they publish something new.

More from Spark Tseung
All posts

Basics of Vector Databases

January 26, 2025•278 words

In a traditional database with structured tables, querying data is easy because each table has a key column that uniquely identifies the rows. In contrast, the same task may be harder for unstructured data such as text documents, audio files and images. A vector database is one solution for storing and querying unstructured data. Given the popularity of large language models, let's use text documents as an example: In a vector database, documents (or parts of them) are stored as high-dimensio...

Read post

Book Review: Build a Large Language Model (From Scratch)

February 2, 2025•405 words

I've just finished reading Building a Large Language Model (From Scratch) by Sebastian Raschka. Overall, I highly recommend the book to anyone interested in an introduction to / a refresher on large language models. The "From Scratch" part in the title is actually what drew my attention in the first place. Nowadays, so many online tutorials on language models are too application-focused (e.g. how to download and use open-source models, how to use API's, etc.). I am not saying they aren't useful...

Read post

Beam Search

More from Spark TseungAll posts

Basics of Vector Databases

Book Review: Build a Large Language Model (From Scratch)

More from Spark Tseung
All posts