Cross Entropy Derivatives, Part 5: Optimizing bias with backpropagation

#ai #machinelearning

In the previous article, we calculated the derivatives of the cross-entropy loss. In this article, we begin optimizing the bias terms using backpropagation.

We start by setting the bias ( b_3 ) to an initial value. In this case, we choose

b_3 = -2

To verify that backpropagation is actually improving the model, we first compute the total cross entropy over the training data for this value of ( b_3 ).

We use the following bias values:

and we keep

b_3 = -2

Forward pass computation

For a single input example, we compute the intermediate values as follows.

Upper node:

Bottom node:

Raw output values

Softmax probabilities

Cross-entropy loss

Petal	Sepal	Species	(p)	Cross Entropy
0.04	0.42	Setosa	0.15	1.89
1.00	0.54	Virginica	0.71	0.35
0.50	0.37	Versicolor	0.65	0.43

The total cross entropy when ( b_3 = -2 ) is

Total CE} = 2.67

Visualizing the loss curve

We can visualize how the total cross entropy changes with different values of ( b_3 ) by plotting ( b_3 ) on the x-axis and the total cross entropy on the y-axis. If we evaluate many values of ( b_3 ), we obtain a smooth pink curve with a clear minimum.

We can visualize this with the help of a python code.

import numpy as np
import matplotlib.pyplot as plt

def relu(x):
    return np.maximum(0, x)

def softmax(raws):
    exp_vals = np.exp(raws)
    return exp_vals / np.sum(exp_vals)

# Fixed biases
b1 = 1.6
b2 = 0.7
b4 = 0
b5 = 1

# Sample training data: (petal, sepal, true_class)
data = [
    (0.04, 0.42, 0),  # Setosa
    (1.00, 0.54, 2),  # Virginica
    (0.50, 0.37, 1),  # Versicolor
]

def total_cross_entropy(b3):
    total_ce = 0.0

    for petal, sepal, target in data:
        upper = petal * -2.5 + sepal * 0.6 + b1
        bottom = petal * -1.5 + sepal * 0.4 + b2

        raw_setosa = relu(upper) * -0.1 + relu(bottom) * 1.5 + b3
        raw_versi = relu(upper) * 2.4 + relu(bottom) * -5.2 + b4
        raw_virg = relu(upper) * -2.2 + relu(bottom) * 3.7 + b5

        probs = softmax([raw_setosa, raw_versi, raw_virg])
        total_ce += -np.log(probs[target])

    return total_ce

b3_values = np.linspace(-6, 4, 200)
losses = [total_cross_entropy(b3) for b3 in b3_values]

plt.plot(b3_values, losses, color="pink")
plt.xlabel("b3")
plt.ylabel("Total Cross Entropy")
plt.title("Cross Entropy vs b3")
plt.show()

This plot produces the pink curve, where the lowest point corresponds to the value of ( b_3 ) that minimizes the total cross entropy.

In the next part of the article, we will use backpropagation to move ( b_3 ) toward this minimum and update it step by step.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run: