Z2H Video 1, finished watching [Post #7, Day 8]

I have now finished watching and working through the first video. I watched it in about 5-6 sessions, the video itself is 02:25:51 but I took much longer so I could pause, rewind, many times as I was coding along. It was very good to work through, for me it was a combination of learning neural net material and Python OOP. It was a good first pass and I may rewatch the whole thing at some point, but I definitely want to rewatch the manual backpropagation work starting from 00:37:30 within the next few days. The concepts were the same as what I worked through in the Graham Ganssle (SEG) example but presented slightly differently which is good for my learning. I was also introduced to PyTorch.

Some new things on the horizon:

  • So far I have used mean squared error to compute loss, but there are other ways such as cross-entropy loss (used in an LLM to predict the next token)
  • For the nonlinear activation function I used the sigmoid in the SEG example, and tanh in Andrej's video, another option is ReLU, Andrej said they are all roughly equivalent and can be used in MLPs, another one is max-margin loss
  • A new concept called "batching" which can be done when there are millions of input examples, rather than running them all through the net at one time you can pick out a random subset – a batch – and only process the batch
  • L2 regularization (not sure what that is yet, it was just mentioned)
  • Learning rate decay, I thought this concept was cool!, changing the learning rate with each epoch, reducing it to hone in on a more exact parameter set (parameter set being the weights and biases for each neuron)

More from A Civil Engineer to AI Software Engineer 🤖
All posts