Z2H Video 1, finished watching [Post #7, Day 8]

January 14, 2025•299 words

I have now finished watching and working through the first video. I watched it in about 5-6 sessions, the video itself is 02:25:51 but I took much longer so I could pause, rewind, many times as I was coding along. It was very good to work through, for me it was a combination of learning neural net material and Python OOP. It was a good first pass and I may rewatch the whole thing at some point, but I definitely want to rewatch the manual backpropagation work starting from 00:37:30 within the next few days. The concepts were the same as what I worked through in the Graham Ganssle (SEG) example but presented slightly differently which is good for my learning. I was also introduced to PyTorch.

Some new things on the horizon:

So far I have used mean squared error to compute loss, but there are other ways such as cross-entropy loss (used in an LLM to predict the next token)
For the nonlinear activation function I used the sigmoid in the SEG example, and tanh in Andrej's video, another option is ReLU, Andrej said they are all roughly equivalent and can be used in MLPs, another one is max-margin loss
A new concept called "batching" which can be done when there are millions of input examples, rather than running them all through the net at one time you can pick out a random subset – a batch – and only process the batch
L2 regularization (not sure what that is yet, it was just mentioned)
Learning rate decay, I thought this concept was cool!, changing the learning rate with each epoch, reducing it to hone in on a more exact parameter set (parameter set being the weights and biases for each neuron)

👍❤️🫶👏👌🤯🤔😂😍😭😢😡😮

More from A Civil Engineer to AI Software Engineer 🤖
All posts

Neural Networks: Zero to Hero [Post #6, Day 7]

January 12, 2025•408 words

I have started Video 1 of 10 of Andrej Karpathy's Neural Networks: Zero to Hero playlist. Video 1 is titled The spelled-out intro to neural networks and backpropagation: building micrograd. I am about 30 minutes in so far and following along by coding along in a Jupyter Notebook. This appears to be pretty much the same MLP NN architecture with the backpropagation algorithm I built previously with Graham Ganssle's tutorial but it's a different coding approach, focused on object-oriented programmi...

Read post

Z2H Video 2, building makemore [Post #8, Day 10]

January 15, 2025•1,311 words

Yesterday I started watching Video 2 in Andrej Karpathy's Zero to Hero series, titled The spelled-out intro to language modeling: building makemore. The makemore library is another library created by Andrej that makes more of the things that you give it. So far we have been building a character-level language model, that is a model that predicts the next character in a sequence of text, one character at a time. We are building a bigram language model, which means we are working with two characte...

Read post

Z2H Video 1, finished watching [Post #7, Day 8]

More from A Civil Engineer to AI Software Engineer 🤖All posts

Neural Networks: Zero to Hero [Post #6, Day 7]

Z2H Video 2, building makemore [Post #8, Day 10]

More from A Civil Engineer to AI Software Engineer 🤖
All posts