@nrailgun 2016-10-29T12:37:45.000000Z 字数 849 阅读 4894

Softmax loss 和 NAN

机器学习

训练的时候发现解释器警告 RuntimeWarning: divide by zero encountered in log，结果发现是计算过程的问题。

$J = 1\{ y = j \} \log{ \frac {e^{\theta_j x}} {\sum_k^K e^{\theta_kx}} }$
等价于（分子分母除以同一个数，避免产生计算溢出）

$J = 1\{ y = j \} \log{ \frac {e^{\theta_j x - \max(\theta x)}} {\sum_k^K e^{\theta_kx - \max(\theta x)}} }$
当 x 数值太大的时候，x - np.max(x) 可能是一个巨大的（负）值，进行指数操作会让数值接近于

$0$ 或者

$\infty$ 。

所以，训练过程中还是要小心参数，避免计算过程中产生的各种溢出问题。

def softmax_loss(x, y):
  """
  Computes the loss and gradient for softmax classification.
  Inputs:
  - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
    for the ith input.
  - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
    0 <= y[i] < C
  Returns a tuple of:
  - loss: Scalar giving the loss
  - dx: Gradient of the loss with respect to x
  """
  probs = np.exp(x - np.max(x, axis=1, keepdims=True))
  probs /= np.sum(probs, axis=1, keepdims=True)
  N = x.shape[0]
  loss = -np.sum(np.log(probs[np.arange(N), y])) / N
  dx = probs.copy()
  dx[np.arange(N), y] -= 1
  dx /= N
  return loss, dx

Softmax loss 和 NAN

内容目录