Deep learning specialization, finished course 4 [Post #29, Day 100]
Well it has been quite a hiatus, and what do you know, I am on Day 100! I have just completed Course 4 of 5 of the Deep Learning Specialization from DeepLearning.AI. This course was all about convolutional neural networks which aren't my primary interest. They are still interesting to learn about, basically all things one can do with images, from object detection (for self-driving cars) to facial recognition, to the most recent section on neural style transfer. Right now I am letting the code i...
Read post
Deep learning specialization, course 4, week 1 [Post #28, Day 69]
It's 23:30 and I just struggled through the conv_forward code block in course 4, week 1, assignment 1. I had to go back to the lecture videos, I had to draw all the matrices out on my grid paper, I had to print statement just about everything in the function, but I figured it out! I learned if I start swearing, getting frustrated, and have neural pain, it is a good thing because I am facing a challenge and learning. I had some help in the forum with figuring out vert_start and horiz_start, I ha...
Read post
Deep learning specialization, course 3 [Post #27, Day 66]
I am back from Florida and getting back to my studying. It was great to spend time with Grandma and Yvonne in Florida. I think I may do a refresher and read all my previous posts up to this point. I've been continuing to work on my ideas for apps and neural network applications. My goal is to finish up this specialization, so complete courses 3, 4, and 5 by 18 March. I also want to continue working on my build ideas in parallel. I have been having good discussions with connections in the indu...
Read post
Deep learning specialization, course 2 [ Post #26, Day 53]
I am about to continue with the graded programming assignment for Course 2, Week 2. Week 2's material included optimization algorithms, for training neural nets more efficiently and faster. We covered mini-batch gradient descent first. Then focused on improving the optimization algorithm itself, learning about gradient descent with momentum (which incorporates a moving average over gradients), gradient descent with RMSprop, and the Adam optimization algorithm which incorporates both momentum and...
Read post
Deep learning specialization, finished course 1 [ Post #25, Day 51]
I have completed the final graded programming assignment of Week 4 of Course 1 for the Deep Learning Specialization. Woohoo! I get a certificate now. The assignments were interesting and helped me to understand how to implement a neural network in Python without using OOP or PyTorch. First, we implemented a logistic regression model (Week 2 programming assignment), with one linear and activation (sigmoid) unit, so no hidden layer(s). The goal was to identify if a given input photo was a photo o...
Read post
Deep learning specialization, started course 1 [Post #24, Day 48]
I started the DeepLearning.AI, Deep Learning Specialization program led by Andrew Ng two days ago. I am doing Course 1 – Neural Networks and Deep Learning – now. I have just started Week 3 of this course. Each week takes about one day, however I haven't paid the Coursera fee yet so I don't have access to the graded assignments. These are the courses making up the Deep Learning Specialization program: Neural networks and deep learning Introduction to deep learning Neural networks basics Sha...
Read post
Z2H Video 4, exercise E01 [Post #23, Day 46]
E01: I did not get around to seeing what happens when you initialize all weights and biases to zero. Try this and train the neural net. You might think either that 1) the network trains just fine or 2) the network doesn't train at all, but actually it is 3) the network trains but only partially, and achieves a pretty bad final performance. Inspect the gradients and activations to figure out what is happening and why the network is only partially training, and what part is being trained exactly. ...
Read post
Z2H Video 4, round 2, finished watching [Post #22, Day 45]
First, just revisiting this exercise question from the previous video, based on new insights given in this Video 4: E02: I was not careful with the intialization of the network in this video. (1) What is the loss you'd get if the predicted probabilities at initialization were perfectly uniform? What loss do we achieve? (2) Can you tune the initialization to get a starting loss that is much more similar to (1)? I was on the right track before, cleaned up answers: (1) What is the loss you'd get...
Read post
Research paper assistant [Post #21, Day 44]
I was struggling through reading the Bengio et al., 2003 paper today, and by struggling I mean I was unable to follow much of it, when I decided to upload it to ChatGPT and try to work sequentially through it. It worked really well for a while, going sentence by sentence and having an explanation by ChatGPT with examples, it really increased my understanding level. But I could only go until I ran out of free prompts, and it started hallucinating I guess and listing sentences that weren't actuall...
Read post
Z2H Video 7, finished watching [Post #20, Day 43]
I have finished watching Video 7 and am feeling discouraged. There was a lot I didn't fully understand. And so far we have only covered the pre-training stage of LLMs, which will create a document completer. There are still several more steps required to build an actual assistant like ChatGPT. I did learn that the "attention" in Attention Is All You Need is basically the ability for tokens to be able to communicate with one another. Here are some notes I took down in my Jupyter Notebook: each...
Read post
Z2H Video 7, jumping ahead [Post #19, Day 39]
I have jumped ahead a bit and started watching Video 7 Let's build GPT: from scratch, in code, spelled out. I was feeling a bit bogged down sludging my way back through the earlier videos and wanted something new and refreshing. I actually watched this while at Yvonne's house but I understand so much more of the detail now that I have worked through the makemore videos. It's a lot of fun working through this one now! I think it was a good idea to bring some freshness in again. I watched a Googl...
Read post
Z2H Video 3, round 2, finished watching [Post #18, Day 38]
I have finished working my way through Video 3 the second time. I understood most of what was presented. I am still not solid on my understanding of what the embedding vectors are and how number of dimensions affects them and the performance of the neural network model. Now time for some exercises. E01: Tune the hyperparameters of the training to beat my best validation loss of 2.2 The parameters I tried adjusting were: block size, i.e., the context length number of embedding vector dime...
Read post
Z2H Video 2, exercises complete [Post #17, Day 36]
Coming back to my work after a nice weekend visiting family near Kingston, New York. E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model? I trained a trigram language model using the counting method. I evaluated the loss. The loss for the bigram model was 2.45, the loss for my trigram model is 2.21. The trigram model improved over the bigram mode...
Read post
Z2H Video 2, exercise E01 [Post #16, Day 33]
I have worked my way through building and "training" my trigram language model using the counting method. Here are some fun names output from my model (I searched through the data set of names and some of these like Samiyah, Kaley, Aviyah, and Glen are actually in the data set, hmmm, is that a problem?): Ce Bra Jalius Rochityharlonimittain Luwak Ka Da Samiyah Javer Gotai Moriellavoji Preda Kaley Maside En Aviyah Folspihiliven Tahlas Kashruban Glen Qualitatively speaking, they seem a little...
Read post
Z2H Video 2, round 2, finished watching [Post #15, Day 32]
Just finished watching Video 2, now attempting the exercises. E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model? This is not an easy one! I started with trying the neural net framework, going to switch to the explicit counts method now. I have also been watching a new video that Andrej just published yesterday: Deep Dive into LLMs like ChatGPT...
Read post
Z2H Video 2, round 2 [Post #14, Day 30]
PyTorch is a deep learning neural network framework. Part 1: bigram language modeling, explicit (statistical) approach Intro to makemore, "makemore takes one text file as input, where each line is assumed to be one training thing, and generates more things like it" Names data set, quick analysis of data set Divide up all the bigrams in the names data set and keep counts of them in a dictionary PyTorch tensor to store bigram counts instead (27 x 27 tensor) Summary Part 2: bigram language mo...
Read post
Z2H Video 1, exercises complete [Post #13, Day 29]
I was able to slog my way through the exercises in Google Colab. Section 1 was relatively straightforward, once I got the hang of the Google Colab sheet and what was being asked I was able to compute partial derivatives (e.g., the partial derivative of f with respect to a) using some help from WolframAlpha. I used the partial derivatives df/da, df/db, and df/dc to compute the analytical gradient for given inputs a, b, and c. Next, I repeated the process manually, using the "nudge" by h method....
Read post
Z2H Video 1, round 2, lots of clarity [Post #12, Day 27]
I'm about two thirds of my way through my second pass of Video 1 and I am understanding a lot of the details. After my first pass I wanted to go back through the manual backpropagation segment especially as I didn't fully understand it, now I feel that I do. I was able to work through the numbers by hand, by drawing the computation graphs out on gridded graph paper and working out the gradients by hand, it was fun! Here's the progress of the video as I have it organized in my mind: Intro to m...
Read post
Z2H Video 5, back to the start [Post #11, Day 26]
I started to watch Z2H Video 5 and Andrej recommended that I work out the exercises for myself before Andrej revealed the solution in the video. I felt I wasn't prepared for this and it was a good time to circle back to the beginning video and solidify more details in my mind. I'm stepping through Video 1 again now and already know I am understanding a lot more of the fine details than I did the first time. Intuitive explanation of the chain rule from Wikipedia: Intuitively, the chain rule st...
Read post
Z2H Video 4, finished watching [Post #10, Day 24]
I have just finished watching Video 4 in the Z2H series. I talked about good initialization in my previous post. This video also introduced the concept of batch normalization, batch normalization layers sprinkled throughout an NN help to stabilize it. We finished the video by "PyTorch-ifying" the code we worked on in the first part of the video. I need to review this material for sure as a lot didn't sink in for me on this initial pass. I am wondering what the best way forward is from here, shou...
Read post
Getting back, Z2H Video 3, building makemore part 2 [Post #9, Day 23]
I had a bit of a hiatus, traveling to Vermont for the past 10 days. I studied a bit but not a whole lot so am getting back into it now. It was tough this morning getting back into the mindset and remembering everything I've done previously, that's how this blog can be helpful! I can re-read my past entries and bring myself up to speed again. I just finished Video 3 in the Z2H series. Andrej introduced some new concepts and there are more knobs I am aware of now for fine-tuning a neural network....
Read post
Sidenote: Z2H chapter progression
I am making this note in an effort to keep everything organized in my mind. These are the chapters from the notes section of each video. Video 1 – The spelled-out intro to neural networks and backpropagation: building micrograd 00:00:00 intro 00:00:25 micrograd overview 00:08:08 derivative of a simple function with one input 00:14:12 derivative of a function with multiple inputs 00:19:09 starting the core Value object of micrograd and its visualization 00:32:10 manual backpropagation exampl...
Read post
Z2H Video 2, building makemore [Post #8, Day 10]
Yesterday I started watching Video 2 in Andrej Karpathy's Zero to Hero series, titled The spelled-out intro to language modeling: building makemore. The makemore library is another library created by Andrej that makes more of the things that you give it. So far we have been building a character-level language model, that is a model that predicts the next character in a sequence of text, one character at a time. We are building a bigram language model, which means we are working with two characte...
Read post
Z2H Video 1, finished watching [Post #7, Day 8]
I have now finished watching and working through the first video. I watched it in about 5-6 sessions, the video itself is 02:25:51 but I took much longer so I could pause, rewind, many times as I was coding along. It was very good to work through, for me it was a combination of learning neural net material and Python OOP. It was a good first pass and I may rewatch the whole thing at some point, but I definitely want to rewatch the manual backpropagation work starting from 00:37:30 within the nex...
Read post
Neural Networks: Zero to Hero [Post #6, Day 7]
I have started Video 1 of 10 of Andrej Karpathy's Neural Networks: Zero to Hero playlist. Video 1 is titled The spelled-out intro to neural networks and backpropagation: building micrograd. I am about 30 minutes in so far and following along by coding along in a Jupyter Notebook. This appears to be pretty much the same MLP NN architecture with the backpropagation algorithm I built previously with Graham Ganssle's tutorial but it's a different coding approach, focused on object-oriented programmi...
Read post
Tracking evolution of parameters [Post #5, Day 4]
So today I'm focusing on plotting the evolution of parameters within my MLP NN for my N to 2N mapping application. I wanted to see how the weights and biases evolve with each epoch, and for more fine-grained – within each sample iteration within each epoch, and I want to see how the loss evolves. One of my questions yesterday was what is the point of plotting validation loss with each epoch. I understand it now – within each epoch, we also run a validation and compute the loss (the training and ...
Read post
Knobs [Post #4, Day 3]
Knobs I can adjust in my MLP neural network: Network architecture How to define/divide up input features Number of hidden layers (this gets more advanced, so far I am working with one hidden layer only) Number of units in each hidden layer (do they need to be the same amount for all hidden layers?) Training Number of epochs (i.e. iterations) Learning rate, so far I have been using 0.001 from the Graham Ganssle example, how do I know how to pick this? Trial and error? Activation function a...
Read post
Sidenote: AI reading log
Here is where I am keeping track of all of my reading and watching of educational videos online. To-read/watch: https://karpathy.github.io/neuralnets/ https://karpathy.github.io/2015/05/21/rnn-effectiveness/ https://www.deeplearningbook.org/ Andrej's blog posts, one about Software 2.0, one about bitcoin, etc. At Andrej's instruction, read PyTorch Broadcasting semantics page https://karpathy.github.io/2015/11/14/ai/ Nando de Freitas writings Other deeplearning.ai courses like Attention in Tr...
Read post
Digging into the details of my MLP [Post #3, Day 2]
I have worked my way sequentially through the forward pass of my multilayer perceptron (MLP) neural network. It involved matrix multiplication and the use of the sigmoid function to compute the activation value for each hidden layer node (i.e. neuron). I use the print function to check values as they flow through my neural network, then I check the computations with hand calculations and a spreadsheet. My neural network (from Graham Ganssle's example) has seven input features (VP, VS, and rho ...
Read post
Building a basic neural network [Post #2, Day 1]
I followed Graham Ganssle's Neural networks tutorial to build a basic neural network (a multilayer perceptron) from scratch. I studied this during my PhD so it's nice to come back to it now. My understanding is that the multilayer perceptron is an 'entry-level' artificial neural network but with concepts that underpin the more advanced architectures like Transformers. The basics are an input layer, hidden layer(s), and output layer. There is an activation function such as sigmoid (outputs number...
Read post
Starting [Post #1, Day 0]
My goal is to become an AI software engineer. I don't know what that will entail exactly yet. For the moment I am focusing on learning more about the Transformer architecture, LLMs, and creating AI agents using the Eliza framework. I have been inspired by (or to use the crypto term, "pilled" by) Shaw, one of the founders of the Eliza framework. He is very motivational in his genuine care for the world of agentic AI to advance and for anyone who wants to be involved to join in the movement and to...
Read post