Skip to content

2022

An Empirical Study of Neural Network Training Dynamics

In the paper Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications, the authors proposed several metrics to quantify examples by how well-represented they are in the underlying distribution.

After reading the paper, I started wondering: When humans learn, we typically begin with easy materials and questions, gradually progressing to more difficult topics. Does neural network learning follow a similar pattern? Do networks learn easy or well-represented examples first and move on to more complex ones later?

To address the question, I trained a simple fully-connected neural network with a single hidden layer consisting of 8 units on the MNIST dataset. To ensure more reliable and less noisy observations, I incorporated an evaluation step after each iteration. During this step, the network processed the entire training set, performed backpropagation, and recorded the prediction results and gradients of the second fully connected layer (an 8 × 10 matrix) for every example, without updating the network parameters.