用C++实现softmax函数(面试经验)

背景

今天面试字节算法岗时被问到的问题，让我用C++实现一个softmax函数。softmax是逻辑回归在多分类问题上的推广。大概的公式如下：
$\{x_1, x_2,\cdots, x_n\}\\ softmax(x_t)=\frac{e^{x_t}}{\sum_{i=1}^{n}e^{x_i}}$
即判断该变量在总体变量中的占比。

第一次实现

实现

我们用vector来封装输入和输出，简单的按公式复现。

vector<double> softmax(vector<double> input)
{
	double total=0;
	for(auto x:input)
	{
		total+=exp(x);
	}
	vector<double> result;
	for(auto x:input)
	{
		result.push_back(exp(x)/total);
	}
	return result;
}

测试

test 1

测试用例1: {1, 2, 3, 4, 5}
测试输出1: {0.0116562, 0.0316849, 0.0861285, 0.234122, 0.636409}

经过简单测试是正常的。
在这里插入图片描述

test 2

但是这时面试官提出了一个问题，即如果有较大输入变量时会怎么样？

测试用例2: {1, 2, 3, 4, 5, 1000}
测试输出2: {0, 0, 0, 0, 0, nan}

由于 $e^{1000}$ 已经溢出了双精度浮点(double)所能表示的范围，所以变成了NaN(not a number)。
在这里插入图片描述

第二次实现(改进)

改进原理

我们注意观察softmax的公式:
$\{x_1, x_2,\cdots, x_n\}\\ softmax(x_t)=\frac{e^{x_t}}{\sum_{i=1}^{n}e^{x_i}}$
如果我们给上下同时乘以一个很小的数，最后答案的值是不变的。
那我们可以给每一个输入 $x_i$ 都减去一个值 $a$ ，防止爆精度。
大致表示如下:
$\frac{e^{x_t}}{\sum_{i=1}^{n}e^{x_i}}= \frac{e^{x_t}\cdot e^{-a}}{e^{-a}\cdot \sum_{i=1}^{n}e^{x_i}}= \frac{e^{x_t}\cdot e^{-a}}{ \sum_{i=1}^{n}e^{x_i}\cdot e^{-a}}= \frac{e^{x_t-a}}{ \sum_{i=1}^{n}e^{x_i-a}}$
那我们如何取这个 $a$ 的值呢？直接取输入中最大的那个即 $max(x_i)$ 就好啦，这样所有的 $e^{x_i-a}$ 的值都不会超过 $e^0=1$ ，更不可能爆精度了。

实现

vector<double> softmax(vector<double> input)
{
	double total=0;
	double MAX=input[0];
	for(auto x:input)
	{
		MAX=max(x,MAX);
	}
	for(auto x:input)
	{
		total+=exp(x-MAX);
	}
	vector<double> result;
	for(auto x:input)
	{
		result.push_back(exp(x-MAX)/total);
	}
	return result;
}

测试

test 1

测试用例1: {1, 2, 3, 4, 5, 1000}
测试输出1: {0, 0, 0, 0, 0, 1}

test 2

测试用例1: {0, 19260817, 19260817}
测试输出1: {0, 0.5, 0.5}

在这里插入图片描述

我们发现结果正常了。

完整代码

#include <iostream>
#include <vector>
#include <math.h>
using namespace std;

vector<double> softmax(vector<double> input)
{
	double total=0;
	double MAX=input[0];
	for(auto x:input)
	{
		MAX=max(x,MAX);
	}
	for(auto x:input)
	{
		total+=exp(x-MAX);
	}
	vector<double> result;
	for(auto x:input)
	{
		result.push_back(exp(x-MAX)/total);
	}
	return result;
}

int main(int argc, char *argv[])
{
	int n;
	cin>>n;
	vector<double> input;
	while(n--)
	{
		double x;
		cin>>x;
		input.push_back(x);
	}
	for(auto y:softmax(input))
	{
		cout<<y<<' ';
	}
}