Getting back, Z2H Video 3, building makemore part 2 [Post #9, Day 23]

I had a bit of a hiatus, traveling to Vermont for the past 10 days. I studied a bit but not a whole lot so am getting back into it now. It was tough this morning getting back into the mindset and remembering everything I've done previously, that's how this blog can be helpful! I can re-read my past entries and bring myself up to speed again.

I just finished Video 3 in the Z2H series. Andrej introduced some new concepts and there are more knobs I am aware of now for fine-tuning a neural network. Andrej referred to these tweaks of the knobs as "experiments" to find the best design of the neural network. We batched the input. We changed the embedding matrices from 27 x 2 to 27 x 10, I still need to wrap my head around this one. We changed the number of iterations. We played around with the learning rate, we did an experiment to find the possible range of learning rates (not so low that the network learns very slowly, and not so high that the loss blows up, i.e. bouncing up and down rather than gradually decreasing). Once we picked a good learning rate (0.1), we then implemented learning rate decay in the later stages of learning (0.01). We added more neurons to our hidden layer. We can also change the number of input characters in the inputs (so far we have been using three).

We did a data split into 80% for training, 10% for dev/validation, and 10% for testing. The dev/validation is done during training and helps us optimize our knobs. The testing split is then used as the very final step.

We also sampled from our model.


I have started watching Video 4. At the beginning of the video we go over better initialization of the NN to make sure we are doing more productive training over the course of the training, and not wasting time at the beginning (creating the "hockey stick" loss with epoch graph). This is ok with our shallow, one hidden layer network, but as networks become deeper with more hidden layers, the initialization becomes more important. With a bad initialization our network could actually not really learn at all.

More from A Civil Engineer to AI Software Engineer 🤖
All posts