Knobs [Post #4, Day 3]
January 8, 2025•581 words
Knobs I can adjust in my MLP neural network:
Network architecture
- How to define/divide up input features
- Number of hidden layers (this gets more advanced, so far I am working with one hidden layer only)
- Number of units in each hidden layer (do they need to be the same amount for all hidden layers?)
Training
- Number of epochs (i.e. iterations)
- Learning rate, so far I have been using 0.001 from the Graham Ganssle example, how do I know how to pick this? Trial and error?
- Activation function and its derivative, so far I have seen only sigmoid logistic function and rectified linear unit (ReLU)
- Division of data set into training and validation (e.g. 80% of total data set for training, and 20% of total data set used for validation)
The tab feature on Cursor got annoying for me. Like it jumps my thinking and then I lose my train of thought, it's often right but it messes me up. I turned it off for now until I'm ready for it again.
Today I worked on building an MLP NN to learn the N to 2N mapping. So apparently a neural network likes input data values between 0 and 1. I set up my problem to have input values from 1 to 100, so 1, 2, 3, 4, and so on. So that makes my answers for training the net on to be 2, 4, 6, and so on. I set things up. The first challenge was getting weight matrix dimensions correct as I was using the net from my first Graham Ganssle example. So I worked that out. I reduced the number of hidden units to 10. Once I got the net running, the result wasn't looking good, it wasn't learning the N to 2N mapping. Cursor/Claude suggested I normalize my inputs to between 0 and 1 because nets like that apparently. The suggested normalization by Cursor wasn't working, the ytrain and yval values were not 2 times the Xtrain and Xval values. So I looked in more detail at it and adjusted the normalization values so they work:
Normalize input data to [0,1] range
Xtrain = (Xtrain - Xtrain.min()) / (Xtrain.max() - Xtrain.min())
Xval = (Xval - Xval.min()) / (Xval.max() - Xval.min())
Scale target values accordingly
ytrain = ((ytrain - ytrain.min()) / (ytrain.max() - ytrain.min())) * 2
yval = ((yval - yval.min()) / (yval.max() - yval.min())) * 2
(Ok and I just realized, well why not just start the input values from 0 to 1 in the first place instead of this extra normalization step, duh, but now I know for other problems, NNs like inputs between 0 and 1.)
After doing that I ran the net and it was looking ok but not great. Then I started turning knobs. I adjusted the learning rate from 0.001 to 0.01, ok better! Then I adjusted number of epochs from 100 to 1000, wow even better! I wonder if there's some algorithm for getting the perfect set of knob adjustments for different problems. Another knob I have currently is number of hidden units.
Now I need to study the loss with epoch, and validation loss with epoch plots, in terms of fine-tuning NN parameters. I'm still unsure why plotting validation loss is meaningful/what it shows.
Some big picture questions:
- What do I want my neural networks to do for me?
- What data do I want to give my neural networks?