Backpropagation Dynamic Programming Formula from Maths

July 4, 2023•436 words

[Old note, see Cap AI/ML Book for updates]

Consider this neural network of 4 neurons:

Graph nodes: Node 1 (out: h1), Node 2 (out: h2), Node 3 (out: u3), Node 4 (out: u4), Node 5 (loss node, out: e), s3 = u3-y3, s4 = u4-y4

See this article for notations: https://dspages.wordpress.com/2022/03/24/a-good-and-rather-complete-notation-for-ml-in-neuralnet/

Graph Note

There’s no edges more at ‘ge’.

Calculus Note

Chain rule (y is function of u, u is function of x):

Utilised below in this kind of expansion:

Sum rule (f1, f2 are functions of x):

w9 and w10:

They are always 1 by design of neuralnet to be optimised with loss function.

Consider using Mean Squared Error as loss function.

g10 for w10:

Doesn’t exist coz w10 doesn’t need a gradient to change.

g9 for w9:

Doesn’t exist coz w9 doesn’t need a gradient to change.

g8 for w8:

$g_8 = \frac{d}{d_{w8}}f_e = \frac{d}{d_{w8}}[\frac{s_3^2 +s_4^2}{2}]\ \ Remove\ s_3\ and\ apply\ chain\ rule,\ \ = \frac{d}{d_{w8}}[\frac{s_4^2}{2}] = \frac{d}{d_{s4}}[\frac{s_4^2}{2}] \times \frac{d}{d_{w8}}s_4 = \frac{2s_4}{2} \times \frac{d}{d_{w8}}s_4 = s_4 \times \frac{d}{d_{w8}}[u_4-y_4]\ \ Remove\ y_4\ which\ is\ constant,\ \ = s_4 \times \frac{d}{d_{w8}}\frac{}{}u_4 = s_4 \times \frac{d}{d_{w8}}f(d_4+b_4) = s_4 \times t(d_4+b_4) \times \frac{d}{d_{w8}}[d_4+b_4] \ \ Remove\ b_4\ which\ is\ not\ related,\ \ = s_4 \times t(d_4+b_4) \times \frac{d}{d_{w8}}[h_1w_6 + h_2w_8]\ \ Remove\ w_6\ which\ is\ not\ related,\ \ = s_4 \times t(d_4+b_4) \times \frac{d}{d_{w8}}h_2w_8 \ \ = [(s_4 \times 1) \times t(d_4+b_4)] \times h_2$

The part in square brackets are not specific to w8, name it v4 (it’s the same for g6 too), call it gradient intermediate value for w8, which multiplies with the input matching w8 will make gradient g8 for w8.

g7, g6, g5:

Not typical, considering g4 below.

g4 for w4:

Dynamic Programming Formula:

No need g3, g2, g1, enough to find out the dynamic programming formula for each neuron:

Functions:

f: Is activation function, t is derivative of activation function
fe: Is loss function, te is derivative of activation function

Values:

v is gradient intermediate value at neuron
t is value of derivative function of the same activated value, ie. t(d+b)
Wtoright from output nodes to loss node are always 1
Gradient at loss node is root gradient ‘ge’
Each output node is always connected to only one v value inside loss node
Vright for each output node is ‘te’ with unrelated variables to that each output node removed.
These Vright can also be notated as componential ‘ge’.
ge3 = te (with s4 removed)
ge4 = te (with s3 removed)

Final note:

Remember that inside loss node, there are N componential ve for calculating v at output layer, although loss node is just 1 node.

Backpropagation Dynamic Programming Formula from Maths

Graph Note

Calculus Note

w9 and w10:

g10 for w10:

g9 for w9:

g8 for w8:

g7, g6, g5:

g4 for w4:

Dynamic Programming Formula:

More from Dan D's Blog
All posts

Sublime and Deep Logic

A Good and Rather Complete Notation for ML in Neuralnet

Backpropagation Dynamic Programming Formula from Maths

Graph Note

Calculus Note

w9 and w10:

g10 for w10:

g9 for w9:

g8 for w8:

g7, g6, g5:

g4 for w4:

Dynamic Programming Formula:

More from Dan D's BlogAll posts

Sublime and Deep Logic

A Good and Rather Complete Notation for ML in Neuralnet

More from Dan D's Blog
All posts