Building a basic neural network [Post #2, Day 1]

I followed Graham Ganssle's Neural networks tutorial to build a basic neural network (a multilayer perceptron) from scratch. I studied this during my PhD so it's nice to come back to it now. My understanding is that the multilayer perceptron is an 'entry-level' artificial neural network but with concepts that underpin the more advanced architectures like Transformers. The basics are an input layer, hidden layer(s), and output layer. There is an activation function such as sigmoid (outputs numbers between 0 and 1) and ReLU (don't know what this one is yet). A key aspect is the backpropagation of error to update the weights and biases applied in the neural network. I believe this key finding was introduced by Geoffrey Hinton, I watched his interview on 60 Minutes. I found the video because I first went to Andrej Karpathy's website and found that he attended lectures by Hinton. There is a training stage where a portion of the data are used (e.g. 80%) with known answers, so the error (loss) can be calculated and used to update the weight and biases (parameters of the neural network). Hyperparameters must be chosen in the training stage, which include number of epochs (iterations through the neural network) and learning rate, which is how much the weights and biases can be adjusted (using gradient descent for example) with each epoch.

I wanted to get an overall framework of AI in my head, I used Clause to help me generate this image. I am adding notes to it as I learn more.
image

I am using Cursor to help with coding, it has a helpful chat window and I am using it to add annotation to each line of code to help me remember what is happening.

More from A Civil Engineer to AI Software Engineer 🤖
All posts