2024-12-22 at 16:21 3b1b

cost function is average over all examples

backprop gives you gradient of C(w1, w2, ...)
(how?)
but to calculate that youd need all examples
instead, we use only a few examples at a time then calculate not the exact gradient but instead a Stochastic Gradient using backprop using those few examples . this makes sense bc it's also how humans learn . we dont need to retrain on 50000 examples before adjusting our strategy/intuition/wtvr (whether consciously or subconsciously) . we adjust as we go, but also not too immediately for like only one example (for an MNIST-type situation), we need like perhaps 10-50+ examples before adjusting . (for situations like reasoning w physics practice problems, we only need like 1 example before adjusting, but that is entirely different..............)

More from corbin
All posts