python – NaN loss when training regression network

python – NaN loss when training regression network

Regression with neural networks is hard to get working because the output is unbounded, so you are especially prone to the exploding gradients problem (the likely cause of the nans).

Historically, one key solution to exploding gradients was to reduce the learning rate, but with the advent of per-parameter adaptive learning rate algorithms like Adam, you no longer need to set a learning rate to get good performance. There is very little reason to use SGD with momentum anymore unless youre a neural network fiend and know how to tune the learning schedule.

Here are some things you could potentially try:

  1. Normalize your outputs by quantile normalizing or z scoring. To be rigorous, compute this transformation on the training data, not on the entire dataset. For example, with quantile normalization, if an example is in the 60th percentile of the training set, it gets a value of 0.6. (You can also shift the quantile normalized values down by 0.5 so that the 0th percentile is -0.5 and the 100th percentile is +0.5).

  2. Add regularization, either by increasing the dropout rate or adding L1 and L2 penalties to the weights. L1 regularization is analogous to feature selection, and since you said that reducing the number of features to 5 gives good performance, L1 may also.

  3. If these still dont help, reduce the size of your network. This is not always the best idea since it can harm performance, but in your case you have a large number of first-layer neurons (1024) relative to input features (35) so it may help.

  4. Increase the batch size from 32 to 128. 128 is fairly standard and could potentially increase the stability of the optimization.

The answer by 1 is quite good. However, all of the fixes seems to fix the issue indirectly rather than directly. I would recommend using gradient clipping, which will clip any gradients that are above a certain value.

In Keras you can use clipnorm=1 (see to simply clip all gradients with a norm above 1.

python – NaN loss when training regression network

I faced the same problem before. I search and find this question and answers. All those tricks mentioned above are important for training a deep neural network. I tried them all, but still got NAN.

I also find this question here.
I cited the authors summary as follows´╝Ü

I wanted to point this out so that its archived for others who may
experience this problem in future. I was running into my loss function
suddenly returning a nan after it go so far into the training process.
I checked the relus, the optimizer, the loss function, my dropout in
accordance with the relus, the size of my network and the shape of the
network. I was still getting loss that eventually turned into a nan
and I was getting quite fustrated.

Then it dawned on me. I may have some bad input. It turns out, one of
the images that I was handing to my CNN (and doing mean normalization
on) was nothing but 0s. I wasnt checking for this case when I
subtracted the mean and normalized by the std deviation and thus I
ended up with an exemplar matrix which was nothing but nans. Once I
fixed my normalization function, my network now trains perfectly.

I agree with the above viewpoint: the input is sensitive for your network. In my case, I use the log value of density estimation as an input. The absolute value could be very huge, which may result in NaN after several steps of gradients. I think the input check is necessary. First, you should make sure the input does not include -inf or inf, or some extremely large numbers in absolute value.

Leave a Reply

Your email address will not be published.