Knobs [Post #4, Day 3]

January 8, 2025•581 words

Knobs I can adjust in my MLP neural network:

Network architecture

How to define/divide up input features
Number of hidden layers (this gets more advanced, so far I am working with one hidden layer only)
Number of units in each hidden layer (do they need to be the same amount for all hidden layers?)

Training

Number of epochs (i.e. iterations)
Learning rate, so far I have been using 0.001 from the Graham Ganssle example, how do I know how to pick this? Trial and error?
Activation function and its derivative, so far I have seen only sigmoid logistic function and rectified linear unit (ReLU)
Division of data set into training and validation (e.g. 80% of total data set for training, and 20% of total data set used for validation)

The tab feature on Cursor got annoying for me. Like it jumps my thinking and then I lose my train of thought, it's often right but it messes me up. I turned it off for now until I'm ready for it again.

Today I worked on building an MLP NN to learn the N to 2N mapping. So apparently a neural network likes input data values between 0 and 1. I set up my problem to have input values from 1 to 100, so 1, 2, 3, 4, and so on. So that makes my answers for training the net on to be 2, 4, 6, and so on. I set things up. The first challenge was getting weight matrix dimensions correct as I was using the net from my first Graham Ganssle example. So I worked that out. I reduced the number of hidden units to 10. Once I got the net running, the result wasn't looking good, it wasn't learning the N to 2N mapping. Cursor/Claude suggested I normalize my inputs to between 0 and 1 because nets like that apparently. The suggested normalization by Cursor wasn't working, the ytrain and yval values were not 2 times the Xtrain and Xval values. So I looked in more detail at it and adjusted the normalization values so they work:

Normalize input data to [0,1] range
Xtrain = (Xtrain - Xtrain.min()) / (Xtrain.max() - Xtrain.min())
Xval = (Xval - Xval.min()) / (Xval.max() - Xval.min())

Scale target values accordingly
ytrain = ((ytrain - ytrain.min()) / (ytrain.max() - ytrain.min())) * 2
yval = ((yval - yval.min()) / (yval.max() - yval.min())) * 2

(Ok and I just realized, well why not just start the input values from 0 to 1 in the first place instead of this extra normalization step, duh, but now I know for other problems, NNs like inputs between 0 and 1.)

After doing that I ran the net and it was looking ok but not great. Then I started turning knobs. I adjusted the learning rate from 0.001 to 0.01, ok better! Then I adjusted number of epochs from 100 to 1000, wow even better! I wonder if there's some algorithm for getting the perfect set of knob adjustments for different problems. Another knob I have currently is number of hidden units.

Now I need to study the loss with epoch, and validation loss with epoch plots, in terms of fine-tuning NN parameters. I'm still unsure why plotting validation loss is meaningful/what it shows.

Some big picture questions:

What do I want my neural networks to do for me?
What data do I want to give my neural networks?

Knobs [Post #4, Day 3]

More from A Civil Engineer to AI Software Engineer 🤖
All posts

Sidenote: AI reading log

Tracking evolution of parameters [Post #5, Day 4]

Knobs [Post #4, Day 3]

More from A Civil Engineer to AI Software Engineer 🤖All posts

Sidenote: AI reading log

Tracking evolution of parameters [Post #5, Day 4]

More from A Civil Engineer to AI Software Engineer 🤖
All posts