Getting back, Z2H Video 3, building makemore part 2 [Post #9, Day 23]

January 28, 2025•375 words

I had a bit of a hiatus, traveling to Vermont for the past 10 days. I studied a bit but not a whole lot so am getting back into it now. It was tough this morning getting back into the mindset and remembering everything I've done previously, that's how this blog can be helpful! I can re-read my past entries and bring myself up to speed again.

I just finished Video 3 in the Z2H series. Andrej introduced some new concepts and there are more knobs I am aware of now for fine-tuning a neural network. Andrej referred to these tweaks of the knobs as "experiments" to find the best design of the neural network. We batched the input. We changed the embedding matrices from 27 x 2 to 27 x 10, I still need to wrap my head around this one. We changed the number of iterations. We played around with the learning rate, we did an experiment to find the possible range of learning rates (not so low that the network learns very slowly, and not so high that the loss blows up, i.e. bouncing up and down rather than gradually decreasing). Once we picked a good learning rate (0.1), we then implemented learning rate decay in the later stages of learning (0.01). We added more neurons to our hidden layer. We can also change the number of input characters in the inputs (so far we have been using three).

We did a data split into 80% for training, 10% for dev/validation, and 10% for testing. The dev/validation is done during training and helps us optimize our knobs. The testing split is then used as the very final step.

We also sampled from our model.

I have started watching Video 4. At the beginning of the video we go over better initialization of the NN to make sure we are doing more productive training over the course of the training, and not wasting time at the beginning (creating the "hockey stick" loss with epoch graph). This is ok with our shallow, one hidden layer network, but as networks become deeper with more hidden layers, the initialization becomes more important. With a bad initialization our network could actually not really learn at all.

👍❤️🫶👏👌🤯🤔😂😍😭😢😡😮

More from A Civil Engineer to AI Software Engineer 🤖
All posts

Sidenote: Z2H chapter progression

January 28, 2025•616 words

I am making this note in an effort to keep everything organized in my mind. These are the chapters from the notes section of each video. Video 1 – The spelled-out intro to neural networks and backpropagation: building micrograd 00:00:00 intro 00:00:25 micrograd overview 00:08:08 derivative of a simple function with one input 00:14:12 derivative of a function with multiple inputs 00:19:09 starting the core Value object of micrograd and its visualization 00:32:10 manual backpropagation exampl...

Read post

Z2H Video 4, finished watching [Post #10, Day 24]

January 29, 2025•150 words

I have just finished watching Video 4 in the Z2H series. I talked about good initialization in my previous post. This video also introduced the concept of batch normalization, batch normalization layers sprinkled throughout an NN help to stabilize it. We finished the video by "PyTorch-ifying" the code we worked on in the first part of the video. I need to review this material for sure as a lot didn't sink in for me on this initial pass. I am wondering what the best way forward is from here, shou...

Read post

Getting back, Z2H Video 3, building makemore part 2 [Post #9, Day 23]

More from A Civil Engineer to AI Software Engineer 🤖All posts

Sidenote: Z2H chapter progression

Z2H Video 4, finished watching [Post #10, Day 24]

More from A Civil Engineer to AI Software Engineer 🤖
All posts