1. 损失函数总览

PyTorch 的 Loss Function（损失函数）都在 torch.nn.functional 里，也提供了封装好的类在 torch.nn 里。

因为 torch.nn 可以记录导数信息，在使用时一般不使用 torch.nn.functional。

PyTorch 里一共有 18 个损失函数，常用的有 6 个，分别是：

回归损失函数：

torch.nn.L1Loss
torch.nn.MSELoss

分类损失函数：

torch.nn.BCELoss
torch.nn.BCEWithLogitsLoss
torch.nn.CrossEntropyLoss
torch.nn.NLLLoss

损失函数是用来衡量模型输出的每个预测值与真实值的差异的：

还有额外的两个概念：

Cost Function（代价函数）是 N 个预测值的损失函数平均值：

Objective Function（目标函数）是最终需要优化的函数：

2. 回归损失函数

回归模型有两种方法进行评估：MAE（mean absolute error）和 MSE（mean squared error）。

torch.nn.L1Loss(reduction='mean')这个类对应了 MAE 损失函数；

torch.nn.MSELoss(reduction='mean')这个类对应了 MSE 损失函数；

上面两个类中的 reduction 规定了获得后的行为，有 none、sum 和 mean 三个。none 表示不对进行任何处理；sum 表示对进行求和；mean 表示对进行平均。默认为 mean。

>>> y = torch.tensor([1.1, 1.2, 1.3])
>>> y_hat = torch.tensor([1., 1., 1.])

>>> criterion_none = nn.L1Loss(reduction='none') # 什么都不做
>>> criterion_none(y_hat, y)
tensor([0.1000, 0.2000, 0.3000])

>>> criterion_mean = nn.L1Loss(reduction='mean') # 求平均
>>> criterion_mean(y_hat, y)
tensor(0.2000)

>>> criterion_sum = nn.L1Loss(reduction='sum') # 求和
>>> criterion_sum(y_hat, y)
tensor(0.6000)

3. 分类损失函数

3.1 交叉熵

自信息是一个事件发生的概率的负对数：

信息熵用来描述一个事件的不确定性公式为:

一个确定的事件的信息熵为 0；一个事件越不确定，信息熵就越大。

交叉熵，用来衡量在给定的真实分布下，使用非真实分布指定的策略消除系统的不确定性所需要付出努力的大小，表达式为

相对熵又叫 “K-L 散度”，用来描述预测事件对真实事件的概率偏差。

而交叉熵的表达式为

可见H(P,Q) ，即交叉熵是信息熵和相对熵的和。上面的P是事件的真实分布， Q是预测出来的分布。所以优化H(P,Q)等价于优化H(Q) ，因为H(P)是已知不变的。

3.2 分类损失函数

下面我们来了解最常用的四个分类损失函数。

torch.nn.BCELoss(weight=None, reduction='mean')

这个类实现了二分类交叉熵。

使用这个类时要注意，输入值（不是分类）的范围要在之间，否则会报错。

>>> inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
>>> target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

>>> criterion = nn.BCELoss()
>>> criterion(inputs, target)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
...
RuntimeError: all elements of input should be between 0 and 1

通常可以先使用 F.sigmoid 处理一下数据。

torch.nn.BCEWithLogitsLoss(weight=None, reduction='mean', pos_weight=None)

等价于 F.sigmoid + torch.nn.BCELoss ，就是先使用了 sigmoid 处理了一下，这样就不需要手动使用 sigmoid 的了。

torch.nn.NLLLoss(weight=None, ignore_index=-100, reduction='mean')

NLLLoss 的全称为 “negative log likelihood loss”，其作用是实现负对数似然函数中的负号。

torch.nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean')

这个类结合了 nn.LogSoftmax 和 nn.NLLLoss。

torch.nn.KLDivLoss(reduction='mean')

这个类就是上面提到的相对熵。

这几个类的参数类似，除了上面提到的 reduction，还有一个 weight，就是每一个类别的权重。下面用例子来解释交叉熵和 weight 是如何运作的。我们先定义一组数据，使用 numpy 推演一下：

inputs = torch.tensor([[1, 1], [1, 2], [3, 3]], dtype=torch.float)
target = torch.tensor([0, 0, 1],dtype=torch.long)

idx = target[0]

input_ = inputs.detach().numpy()[idx]      # [1, 1]
target_ = target.numpy()[idx]              # [0]

# 第一项
x_class = input_[target_]

# 第二项
sigma_exp_x = np.sum(list(map(np.exp, input_)))
log_sigma_exp_x = np.log(sigma_exp_x)

# 输出 loss
loss_1 = -x_class + log_sigma_exp_x

结果为

>>> print("第一个样本 loss 为: ", loss_1)
第一个样本 loss 为:  0.6931473

现在我们再使用 PyTorch 来计算：

>>> criterion_ce = nn.CrossEntropyLoss(reduction='none')
>>> criterion_ce(inputs, target)
tensor([0.6931, 1.3133, 0.6931])

可以看到，结果是一致的。现在我们再看看 weight：

>>> weight = torch.tensor([0.1, 0.9], dtype=torch.float)
>>> criterion_ce = nn.CrossEntropyLoss(weight=weight, reduction='none')
>>> criterion_ce(inputs, target)
tensor([0.0693, 0.1313, 0.6238])

与没有权重的交叉熵进行比较后可以发现，每一个值都乘以了。当 reduction 为 sum 和 mean 的时候，交叉熵的加权总和或者平均值再除以权重的和。

3.3 总结

F.sigmoid （激活函数）+ nn.BCELoss （损失函数）= torch.nn.BCEWithLogitsLoss（损失函数）
nn.LogSoftmax （激活函数）+ nn.NLLLoss （损失函数）= torch.nn.CrossEntropyLoss（损失函数）