Backpropagation Dynamic Programming Formula from Maths

[Old note, see Cap AI/ML Book for updates]

Consider this neural network of 4 neurons:

Graph nodes: Node 1 (out: h1), Node 2 (out: h2), Node 3 (out: u3), Node 4 (out: u4), Node 5 (loss node, out: e), s3 = u3-y3, s4 = u4-y4

See this article for notations: https://dspages.wordpress.com/2022/03/24/a-good-and-rather-complete-notation-for-ml-in-neuralnet/

Graph Note

There’s no edges more at ‘ge’.

Calculus Note

Chain rule (y is function of u, u is function of x):

Utilised below in this kind of expansion:

Sum rule (f1, f2 are functions of x):

w9 and w10:

They are always 1 by design of neuralnet to be optimised with loss function.

Consider using Mean Squared Error as loss function.

g10 for w10:

Doesn’t exist coz w10 doesn’t need a gradient to change.

g9 for w9:

Doesn’t exist coz w9 doesn’t need a gradient to change.

g8 for w8:

g_8 = \frac{d}{d_{w8}}f_e = \frac{d}{d_{w8}}[\frac{s_3^2 +s_4^2}{2}]\ \<br>Remove\ s_3\ and\ apply\ chain\ rule,\ \<br>= \frac{d}{d_{w8}}[\frac{s_4^2}{2}] = \frac{d}{d_{s4}}[\frac{s_4^2}{2}] \times \frac{d}{d_{w8}}s_4<br>= \frac{2s_4}{2} \times \frac{d}{d_{w8}}s_4 = s_4 \times \frac{d}{d_{w8}}[u_4-y_4]\ \<br>Remove\ y_4\ which\ is\ constant,\ \<br>= s_4 \times \frac{d}{d_{w8}}\frac{}{}u_4 = s_4 \times \frac{d}{d_{w8}}f(d_4+b_4) =<br>s_4 \times t(d_4+b_4) \times \frac{d}{d_{w8}}[d_4+b_4] \ \<br>Remove\ b_4\ which\ is\ not\ related,\ \<br>= s_4 \times t(d_4+b_4) \times \frac{d}{d_{w8}}[h_1w_6 + h_2w_8]\ \<br>Remove\ w_6\ which\ is\ not\ related,\ \<br>= s_4 \times t(d_4+b_4) \times \frac{d}{d_{w8}}h_2w_8 \ \<br>= [(s_4 \times 1) \times t(d_4+b_4)] \times h_2

The part in square brackets are not specific to w8, name it v4 (it’s the same for g6 too), call it gradient intermediate value for w8, which multiplies with the input matching w8 will make gradient g8 for w8.

g7, g6, g5:

Not typical, considering g4 below.

g4 for w4:

Dynamic Programming Formula:

No need g3, g2, g1, enough to find out the dynamic programming formula for each neuron:

Functions:

  • f: Is activation function, t is derivative of activation function
  • fe: Is loss function, te is derivative of activation function

Values:

  • v is gradient intermediate value at neuron
  • t is value of derivative function of the same activated value, ie. t(d+b)
  • Wtoright from output nodes to loss node are always 1
  • Gradient at loss node is root gradient ‘ge’
  • Each output node is always connected to only one v value inside loss node
  • Vright for each output node is ‘te’ with unrelated variables to that each output node removed.
    These Vright can also be notated as componential ‘ge’.
  • ge3 = te (with s4 removed)
  • ge4 = te (with s3 removed)

Final note:

  • Remember that inside loss node, there are N componential ve for calculating v at output layer, although loss node is just 1 node.

You'll only receive email when they publish something new.

More from 19411
All posts