Sidenote: Z2H chapter progression

I am making this note in an effort to keep everything organized in my mind. These are the chapters from the notes section of each video.


Video 1 – The spelled-out intro to neural networks and backpropagation: building micrograd

  • 00:00:00 intro
  • 00:00:25 micrograd overview
  • 00:08:08 derivative of a simple function with one input
  • 00:14:12 derivative of a function with multiple inputs
  • 00:19:09 starting the core Value object of micrograd and its visualization
  • 00:32:10 manual backpropagation example #1: simple expression
  • 00:51:10 preview of a single optimization step
  • 00:52:52 manual backpropagation example #2: a neuron
  • 01:09:02 implementing the backward function for each operation
  • 01:17:32 implementing the backward function for a whole expression graph
  • 01:22:28 fixing a backprop bug when one node is used multiple times
  • 01:27:05 breaking up a tanh, exercising with more operations
  • 01:39:31 doing the same thing but in PyTorch: comparison
  • 01:43:55 building out a neural net library (multi-layer perceptron) in micrograd
  • 01:51:04 creating a tiny dataset, writing the loss function
  • 01:57:56 collecting all of the parameters of the neural net
  • 02:01:12 doing gradient descent optimization manually, training the network
  • 02:14:03 summary of what we learned, how to go towards modern neural nets
  • 02:16:46 walkthrough of the full code of micrograd on github
  • 02:21:10 real stuff: diving into PyTorch, finding their backward pass for tanh
  • 02:24:39 conclusion
  • 02:25:20 outtakes :)

Video 2 – The spelled-out intro to language modeling: building makemore

  • 00:00:00 intro
  • 00:03:03 reading and exploring the dataset
  • 00:06:24 exploring the bigrams in the dataset
  • 00:09:24 counting bigrams in a python dictionary
  • 00:12:45 counting bigrams in a 2D torch tensor ("training the model")
  • 00:18:19 visualizing the bigram tensor
  • 00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token
  • 00:24:02 sampling from the model
  • 00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting
  • 00:50:14 loss function (the negative log likelihood of the data under our model)
  • 01:00:50 model smoothing with fake counts
  • 01:02:57 PART 2: the neural network approach: intro
  • 01:05:26 creating the bigram dataset for the neural net
  • 01:10:01 feeding integers into neural nets? one-hot encodings
  • 01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication
  • 01:18:46 transforming neural net outputs into probabilities: the softmax
  • 01:26:17 summary, preview to next steps, reference to micrograd
  • 01:35:49 vectorized loss
  • 01:38:36 backward and update, in PyTorch
  • 01:42:55 putting everything together
  • 01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
  • 01:50:18 note 2: model smoothing as regularization loss
  • 01:54:31 sampling from the neural net
  • 01:56:16 conclusion

Video 3 – Building makemore Part 2: MLP

  • 00:00:00 intro
  • 00:01:48 Bengio et al. 2003 (MLP language model) paper walkthrough
  • 00:09:03 (re-)building our training dataset
  • 00:12:19 implementing the embedding lookup table
  • 00:18:35 implementing the hidden layer + internals of torch.Tensor: storage, views
  • 00:29:15 implementing the output layer
  • 00:29:53 implementing the negative log likelihood loss
  • 00:32:17 summary of the full network
  • 00:32:49 introducing F.cross_entropy and why
  • 00:37:56 implementing the training loop, overfitting one batch
  • 00:41:25 training on the full dataset, minibatches
  • 00:45:40 finding a good initial learning rate
  • 00:53:20 splitting up the dataset into train/val/test splits and why
  • 01:00:49 experiment: larger hidden layer
  • 01:05:27 visualizing the character embeddings
  • 01:07:16 experiment: larger embedding size
  • 01:11:46 summary of our final code, conclusion
  • 01:13:24 sampling from the model
  • 01:14:55 google collab (new!!) notebook advertisement

More from A Civil Engineer to AI Software Engineer 🤖
All posts