Knobs [Post #4, Day 3]

Knobs I can adjust in my MLP neural network:

Network architecture

  • How to define/divide up input features
  • Number of hidden layers (this gets more advanced, so far I am working with one hidden layer only)
  • Number of units in each hidden layer (do they need to be the same amount for all hidden layers?)

Training

  • Number of epochs (i.e. iterations)
  • Learning rate, so far I have been using 0.001 from the Graham Ganssle example, how do I know how to pick this? Trial and error?
  • Activation function and its derivative, so far I have seen only sigmoid logistic function and rectified linear unit (ReLU)
  • Division of data set into training and validation (e.g. 80% of total data set for training, and 20% of total data set used for validation)

The tab feature on Cursor got annoying for me. Like it jumps my thinking and then I lose my train of thought, it's often right but it messes me up. I turned it off for now until I'm ready for it again.


Today I worked on building an MLP NN to learn the N to 2N mapping. So apparently a neural network likes input data values between 0 and 1. I set up my problem to have input values from 1 to 100, so 1, 2, 3, 4, and so on. So that makes my answers for training the net on to be 2, 4, 6, and so on. I set things up. The first challenge was getting weight matrix dimensions correct as I was using the net from my first Graham Ganssle example. So I worked that out. I reduced the number of hidden units to 10. Once I got the net running, the result wasn't looking good, it wasn't learning the N to 2N mapping. Cursor/Claude suggested I normalize my inputs to between 0 and 1 because nets like that apparently. The suggested normalization by Cursor wasn't working, the ytrain and yval values were not 2 times the Xtrain and Xval values. So I looked in more detail at it and adjusted the normalization values so they work:

Normalize input data to [0,1] range
Xtrain = (Xtrain - Xtrain.min()) / (Xtrain.max() - Xtrain.min())
X
val = (Xval - Xval.min()) / (Xval.max() - Xval.min())

Scale target values accordingly
ytrain = ((ytrain - ytrain.min()) / (ytrain.max() - ytrain.min())) * 2
y
val = ((yval - yval.min()) / (yval.max() - yval.min())) * 2

(Ok and I just realized, well why not just start the input values from 0 to 1 in the first place instead of this extra normalization step, duh, but now I know for other problems, NNs like inputs between 0 and 1.)

After doing that I ran the net and it was looking ok but not great. Then I started turning knobs. I adjusted the learning rate from 0.001 to 0.01, ok better! Then I adjusted number of epochs from 100 to 1000, wow even better! I wonder if there's some algorithm for getting the perfect set of knob adjustments for different problems. Another knob I have currently is number of hidden units.

Now I need to study the loss with epoch, and validation loss with epoch plots, in terms of fine-tuning NN parameters. I'm still unsure why plotting validation loss is meaningful/what it shows.


Some big picture questions:

  • What do I want my neural networks to do for me?
  • What data do I want to give my neural networks?

More from A Civil Engineer to AI Software Engineer 🤖
All posts