Smooth Support Vector Regression - Python实现
时间:2020-03-24
本文章向大家介绍Smooth Support Vector Regression - Python实现,主要包括Smooth Support Vector Regression - Python实现使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
- 算法特征:
①. 所有点尽可能落在管道内; ②. 极大化margin; ③. 极小化管道残差. - 算法推导:
Part Ⅰ
引入光滑化手段: 详见https://www.cnblogs.com/xxhbdk/p/12275567.html
定义:
\begin{equation*}
\begin{split}
\left| x \right|_{\epsilon} &= \max\{0, \left| x \right| - \epsilon\} \\
&=\max\{0, x - \epsilon\} + \max\{0, -x - \epsilon\} \\
&=(x-\epsilon)_+ + (-x-\epsilon)_+
\end{split}
\end{equation*}
由于$(x-\epsilon)_+ \cdot (-x-\epsilon)_+ = 0$, 则
\begin{equation*}
\left|x\right|_{\epsilon}^2 = (x-\epsilon)_+^2 + (-x - \epsilon)_+^2
\end{equation*}
定义:
\begin{equation*}
p_{\epsilon}^2(x, \beta) = (p(x-\epsilon, \beta))^2 + (p(-x-\epsilon, \beta))^2
\end{equation*}
其中, $p(x, \beta) = \frac{1}{\beta}\ln(\mathrm{e}^{\beta x} + 1)$, $\beta$为光滑化参数.
现采用$p_{\epsilon}^2(x, \beta)$近似$\left|x\right|_{\epsilon}^2$, 相关函数图像如下:
下面给出$p_{\epsilon}^2(x, \beta)$的1阶及2阶导数:
\begin{align*}
\frac{\mathrm{d}p_{\epsilon}^2(x, \beta)}{\mathrm{d}x} &= 2[p(x-\epsilon, \beta)s(x-\epsilon, \beta) - p(-x-\epsilon, \beta)s(-x-\epsilon, \beta)] \\
\frac{\mathrm{d}^2p_{\epsilon}^2(x, \beta)}{\mathrm{d}x^2} &= 2[s^2(x-\epsilon, \beta) + p(x-\epsilon, \beta)\delta(x-\epsilon, \beta) + s^2(-x-\epsilon, \beta) + p(-x-\epsilon, \beta)\delta(-x-\epsilon, \beta)]
\end{align*}
其中, $s(x, \beta) = 1 / (\mathrm{e}^{-\beta x} + 1)$, $\delta(x, \beta) = \beta \mathrm{e}^{\beta x} / (\mathrm{e}^{\beta x} + 1)^2$.
Part Ⅱ
SVC线性回归方程如下:
\begin{equation}
h(x) = W^\mathrm{T}x + b
\label{eq_1}
\end{equation}
其中, $W$是超平面法向量, $b$是超平面偏置参数.
本文拟采用$L_2$范数衡量拟合残差, 则SVR原始优化问题如下:
\begin{equation}
\begin{split}
\min\quad &\frac{1}{2}\|W\|_2^2 + \frac{c}{2}(\|\xi\|_2^2 + \|\xi^{\ast}\|_2^2) \\
\mathrm{s.t.}\quad & \bar{Y} - (A^{\mathrm{T}}W + \mathbf{1}b) \leq \mathbf{1}\epsilon + \xi \\
& \bar{Y} - (A^{\mathrm{T}}W + \mathbf{1}b) \geq -\mathbf{1}\epsilon - \xi^{\ast}
\end{split}
\label{eq_2}
\end{equation}
其中, $A = [x^{(1)}, x^{(2)}, \cdots, x^{(n)}]$, $\bar{Y} = [\bar{y}^{(1)}, \bar{y}^{(2)}, \cdots, \bar{y}^{(n)}]^{\mathrm{T}}$, $\epsilon$是管道残差参数, $\xi$是由所有样本上管道残差组成的列矢量, $-\xi^{\ast}$是由所有样本下管道残差组成的列矢量, $c$为权重参数.
根据上述优化问题(\ref{eq_2}), 管道残差$\xi$、$\xi^{\ast}$可采用plus function等效表示:
\begin{equation}
\begin{cases}
\xi = (\bar{Y} - (A^{\mathrm{T}}W + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \\
\xi^{\ast} = (- \bar{Y} + (A^{\mathrm{T}}W + \mathbf{1}b) - \mathbf{1}\epsilon)_+
\end{cases}
\label{eq_3}
\end{equation}
由此可将该有约束最优化问题转化为如下无约束最优化问题:
\begin{equation}
\min\quad \frac{1}{2}\|W\|_2^2 + \frac{c}{2}(\| (\bar{Y} - (A^{\mathrm{T}}W + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2 + \| (- \bar{Y} + (A^{\mathrm{T}}W + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2)
\label{eq_4}
\end{equation}
同样, 根据优化问题(\ref{eq_2})KKT条件中关于原变量$W$的稳定性条件有:
\begin{equation}
W = A(\alpha - \alpha^{\ast}) = A\alpha^{\prime}
\label{eq_5}
\end{equation}
其中, $\alpha$、$\alpha^{\ast}$均为对偶变量. 代入式(\ref{eq_4})变数变换可得:
\begin{equation}
\min\quad \frac{1}{2}\alpha^{\prime\mathrm{T}}A^{\mathrm{T}}A\alpha^{\prime} + \frac{c}{2}(\| (\bar{Y} - (A^{\mathrm{T}}A\alpha^{\prime} + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2 + \| (- \bar{Y} + (A^{\mathrm{T}}A\alpha^{\prime} + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2)
\label{eq_6}
\end{equation}
由于权重参数$c$的存在, 上述最优化问题等效如下多目标最优化问题:
\begin{equation}
\begin{cases}
\min\quad \frac{1}{2}\alpha^{\prime\mathrm{T}}A^{\mathrm{T}}A\alpha^{\prime} \\
\min\quad \frac{1}{2}(\| (\bar{Y} - (A^{\mathrm{T}}A\alpha^{\prime} + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2 + \| (- \bar{Y} + (A^{\mathrm{T}}A\alpha^{\prime} + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2)
\end{cases}
\label{eq_7}
\end{equation}
由于$A^{\mathrm{T}}A\succeq 0$, 有如下等效:
\begin{equation}
\min\quad \frac{1}{2}\alpha^{\prime\mathrm{T}}A^{\mathrm{T}}A\alpha^{\prime} \quad\Rightarrow\quad \min\quad \frac{1}{2}\alpha^{\prime\mathrm{T}}\alpha^{\prime}
\label{eq_8}
\end{equation}
根据Part Ⅰ中光滑化手段, 有如下等效:
\begin{equation}
\min\quad \frac{1}{2}(\| (\bar{Y} - (A^{\mathrm{T}}A\alpha^{\prime} + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2 + \| (- \bar{Y} + (A^{\mathrm{T}}A\alpha^{\prime} + \mathbf{1}b) - \mathbf{1}\epsilon)_+ \|_2^2) \quad\Rightarrow\quad \min\quad \frac{1}{2}\sum_ip_{\epsilon}^2(\bar{y}^{(i)} - x^{(i)\mathrm{T}}A\alpha^{\prime} - b, \beta), \quad \text{where } \beta\rightarrow\infty
\label{eq_9}
\end{equation}
结合式(\ref{eq_8})、式(\ref{eq_9}), 优化问题(\ref{eq_6})等效如下:
\begin{equation}
\min\quad \frac{1}{2}\alpha^{\prime\mathrm{T}}\alpha^{\prime} + \frac{c}{2}\sum_ip_{\epsilon}^2(\bar{y}^{(i)} - x^{(i)\mathrm{T}}A\alpha^{\prime} - b, \beta), \quad \text{where } \beta\rightarrow\infty
\label{eq_10}
\end{equation}
为增强模型的回归能力, 引入kernel trick, 得最终优化问题:
\begin{equation}
\min\quad \frac{1}{2}\alpha^{\prime\mathrm{T}}\alpha^{\prime} + \frac{c}{2}\sum_ip_{\epsilon}^2(\bar{y}^{(i)} - K(x^{(i)\mathrm{T}}, A)\alpha^{\prime} - b, \beta), \quad \text{where } \beta\rightarrow\infty
\label{eq_11}
\end{equation}
其中, $K$代表kernel矩阵. 内部元素$K_{ij}=k(x^{(i)}, x^{(j)})$由kernel函数决定. 常见得kernel函数如下:
①. 多项式核: $k(x^{(i)}, x^{(j)}) = (x^{(i)}\cdot x^{(j)})^n$
②. 高斯核: $k(x^{(i)}, x^{(j)}) = \mathrm{e}^{-\mu\|x^{(i)} - x^{(j)}\|_2^2}$
结合式(\ref{eq_1})、式(\ref{eq_5})及kernel trick可得最终回归方程:
\begin{equation}
h(x) = \alpha^{\prime\mathrm{T}}K(A^{\mathrm{T}}, x) + b
\label{eq_12}
\end{equation}
Part Ⅲ
以下给出优化相关协助信息, 拟设式(\ref{eq_11})之目标函数为$J$, 并令:
\begin{equation}
J = J_1 + J_2 \quad\quad\text{其中,}\begin{cases}
J_1 = \frac{1}{2}\alpha^{\prime\mathrm{T}}\alpha^{\prime} \\
J_2 = \frac{c}{2}\sum\limits_ip_{\epsilon}^2(\bar{y}^{(i)} - K(x^{(i)\mathrm{T}}, A)\alpha^{\prime} - b, \beta)
\end{cases}
\label{eq_13}
\end{equation}
$J_1$相关:
\begin{gather*}
\frac{\partial J_1}{\partial \alpha^{\prime}} = \alpha^{\prime} \quad
\frac{\partial J_1}{\partial b} = 0 \\
\frac{\partial^2 J_1}{\partial \alpha^{\prime 2}} = I \quad
\frac{\partial^2 J_1}{\partial \alpha^{\prime} \partial b} = 0 \quad
\frac{\partial^2 J_1}{\partial b \partial \alpha^{\prime}} = 0 \quad
\frac{\partial^2 J_1}{\partial b^2} = 0
\end{gather*}
\begin{gather*}
\nabla J_1 = \begin{bmatrix}\frac{\partial J_1}{\partial \alpha^{\prime}} \\ \frac{\partial J_1}{\partial b} \end{bmatrix} \\ \\
\nabla^2 J_1 = \begin{bmatrix} \frac{\partial^2 J_1}{\partial \alpha^{\prime 2}} & \frac{\partial^2 J_1}{\partial \alpha^{\prime} \partial b} \\ \frac{\partial^2 J_1}{\partial b \partial \alpha^{\prime}} & \frac{\partial^2 J_1}{\partial b^2} \end{bmatrix}
\end{gather*}
$J_2$相关:
\begin{gather*}
z_i = \bar{y}^{(i)} - K(x^{(i)\mathrm{T}}, A)\alpha^{\prime} - b \\
\frac{\partial J_2}{\partial \alpha^{\prime}} = -c\sum_i[p(z_i-\epsilon, \beta)s(z_i-\epsilon, \beta) - p(-z_i - \epsilon, \beta)s(-z_i-\epsilon, \beta)]K^{\mathrm{T}}(x^{(i)\mathrm{T}}, A) \\
\frac{\partial J_2}{\partial b} = -c\sum_i[p(z_i-\epsilon, \beta)s(z_i-\epsilon, \beta) - p(-z_i - \epsilon, \beta)s(-z_i - \epsilon, \beta)] \\
\frac{\partial^2 J_2}{\partial \alpha^{\prime 2}} = c\sum_i[s^2(z_i-\epsilon, \beta) + p(z_i-\epsilon, \beta)\delta(z_i-\epsilon, \beta) + s^2(-z_i-\epsilon, \beta) + p(-z_i-\epsilon, \beta)\delta(-z_i-\epsilon, \beta)]K^{\mathrm{T}}(x^{(i)\mathrm{T}}, A)K(x^{(i)\mathrm{T}}, A) \\
\frac{\partial^2 J_2}{\partial \alpha^{\prime} \partial b} = c\sum_i[s^2(z_i-\epsilon, \beta) + p(z_i-\epsilon, \beta)\delta(z_i-\epsilon, \beta) + s^2(-z_i-\epsilon, \beta) + p(-z_i-\epsilon, \beta)\delta(-z_i-\epsilon, \beta)]K^{\mathrm{T}}(x^{(i)\mathrm{T}}, A) \\
\frac{\partial^2 J_2}{\partial b \partial \alpha^{\prime}} = c\sum_i[s^2(z_i-\epsilon, \beta) + p(z_i-\epsilon, \beta)\delta(z_i-\epsilon, \beta) + s^2(-z_i-\epsilon, \beta) + p(-z_i-\epsilon, \beta)\delta(-z_i-\epsilon, \beta)]K(x^{(i)\mathrm{T}}, A) \\
\frac{\partial^2 J_2}{\partial b^2} = c\sum_i[s^2(z_i-\epsilon, \beta) + p(z_i-\epsilon, \beta)\delta(z_i-\epsilon, \beta) + s^2(-z_i-\epsilon, \beta) + p(-z_i-\epsilon, \beta)\delta(-z_i-\epsilon, \beta)]
\end{gather*}
\begin{gather*}
\nabla J_2 = \begin{bmatrix}\frac{\partial J_2}{\partial \alpha^{\prime}} \\ \frac{\partial J_2}{\partial b} \end{bmatrix} \\ \\
\nabla^2 J_2 = \begin{bmatrix} \frac{\partial^2 J_2}{\partial \alpha^{\prime 2}} & \frac{\partial^2 J_2}{\partial \alpha^{\prime} \partial b} \\ \frac{\partial^2 J_2}{\partial b \partial \alpha^{\prime}}& \frac{\partial^2 J_2}{\partial b^2} \end{bmatrix}
\end{gather*}
目标函数$J$之gradient及Hessian如下:
\begin{align}
\nabla J &= \nabla J_1 + \nabla J_2 \\
\nabla^2 J &= \nabla^2 J_1 + \nabla^2 J_2
\end{align}
Part Ⅳ
现以如下peaks function为例进行算法实施:
\begin{equation*}
f(x_1, x_2) = 3(1 - x_1)^2\mathrm{e}^{-x_1^2 - (x_2 + 1)^2} - 10(\frac{x_1}{5} - x_1^3 - x_2^5)\mathrm{e}^{-x_1^2 - x_2^2} - \frac{1}{3}\mathrm{e}^{-(x_1 + 1)^2 - x_2^2}, \quad \text{where } x_1\in[-3, 3], x_2\in[-3, 3]
\end{equation*}
标准函数图像如下: - 代码实现:
1 # Smooth Support Vector Regression之实现 2 3 import numpy 4 from matplotlib import pyplot as plt 5 from mpl_toolkits.mplot3d import Axes3D 6 7 8 numpy.random.seed(0) 9 10 def peaks(x1, x2): 11 term1 = 3 * (1 - x1) ** 2 * numpy.exp(-x1 ** 2 - (x2 + 1) ** 2) 12 term2 = -10 * (x1 / 5 - x1 ** 3 - x2 ** 5) * numpy.exp(-x1 ** 2 - x2 ** 2) 13 term3 = -numpy.exp(-(x1 + 1) ** 2 - x2 ** 2) / 3 14 val = term1 + term2 + term3 15 return val 16 17 18 # 生成回归数据 19 X1 = numpy.linspace(-3, 3, 30) 20 X2 = numpy.linspace(-3, 3, 30) 21 X1, X2 = numpy.meshgrid(X1, X2) 22 X = numpy.hstack((X1.reshape((-1, 1)), X2.reshape((-1, 1)))) # 待回归数据之样本点 23 Y = peaks(X1, X2).reshape((-1, 1)) 24 Yerr = Y + numpy.random.normal(0, 0.4, size=Y.shape) # 待回归数据之观测值 25 26 27 28 class SSVR(object): 29 30 def __init__(self, X, Y_, c=50, mu=1, epsilon=0.5, beta=10): 31 ''' 32 X: 样本点数据集, 1行代表1个样本 33 Y_: 观测值数据集, 1行代表1个观测值 34 ''' 35 self.__X = X # 待回归数据之样本点 36 self.__Y_ = Y_ # 待回归数据之观测值 37 self.__c = c # 误差项权重参数 38 self.__mu = mu # gaussian kernel参数 39 self.__epsilon = epsilon # 管道残差参数 40 self.__beta = beta # 光滑化参数 41 42 self.__A = X.T 43 44 45 def get_estimation(self, x, alpha, b): 46 ''' 47 获取估计值 48 ''' 49 A = self.__A 50 mu = self.__mu 51 52 x = numpy.array(x).reshape((-1, 1)) 53 KAx = self.__get_KAx(A, x, mu) 54 regVal = self.__calc_hVal(KAx, alpha, b) 55 return regVal 56 57 58 def get_MAE(self, alpha, b): 59 ''' 60 获取平均绝对误差 61 ''' 62 X = self.__X 63 Y_ = self.__Y_ 64 65 Y = numpy.array(list(self.get_estimation(x, alpha, b) for x in X)).reshape((-1, 1)) 66 RES = Y_ - Y 67 MAE = numpy.linalg.norm(RES, ord=1) / alpha.shape[0] 68 return MAE 69 70 71 def get_GE(self): 72 ''' 73 获取泛化误差 74 ''' 75 X = self.__X 76 Y_ = self.__Y_ 77 78 cnt = 0 79 GE = 0 80 for idx in range(X.shape[0]): 81 x = X[idx:idx+1, :] 82 y_ = Y_[idx, 0] 83 84 self.__X = numpy.vstack((X[:idx, :], X[idx+1:, :])) 85 self.__Y_ = numpy.vstack((Y_[:idx, :], Y_[idx+1:, :])) 86 self.__A = self.__X.T 87 alpha, b, tab = self.optimize() 88 if not tab: 89 continue 90 cnt += 1 91 y = self.get_estimation(x, alpha, b) 92 GE += (y_ - y) ** 2 93 GE /= cnt 94 95 self.__X = X 96 self.__Y_ = Y_ 97 self.__A = X.T 98 return GE 99 100 101 def optimize(self, maxIter=1000, EPSILON=1.e-9): 102 ''' 103 maxIter: 最大迭代次数 104 EPSILON: 收敛判据, 梯度趋于0则收敛 105 ''' 106 A, Y_ = self.__A, self.__Y_ 107 c = self.__c 108 mu = self.__mu 109 epsilon = self.__epsilon 110 beta = self.__beta 111 112 alpha, b = self.__init_alpha_b((A.shape[1], 1)) 113 KAA = self.__get_KAA(A, mu) 114 JVal = self.__calc_JVal(KAA, Y_, c, epsilon, beta, alpha, b) 115 grad = self.__calc_grad(KAA, Y_, c, epsilon, beta, alpha, b) 116 Hess = self.__calc_Hess(KAA, Y_, c, epsilon, beta, alpha, b) 117 118 for i in range(maxIter): 119 print("iterCnt: {:3d}, JVal: {}".format(i, JVal)) 120 if self.__converged1(grad, EPSILON): 121 return alpha, b, True 122 123 dCurr = -numpy.matmul(numpy.linalg.inv(Hess), grad) 124 ALPHA = self.__calc_ALPHA_by_ArmijoRule(alpha, b, JVal, grad, dCurr, KAA, Y_, c, epsilon, beta) 125 126 delta = ALPHA * dCurr 127 alphaNew = alpha + delta[:-1, :] 128 bNew = b + delta[-1, -1] 129 JValNew = self.__calc_JVal(KAA, Y_, c, epsilon, beta, alphaNew, bNew) 130 if self.__converged2(delta, JValNew - JVal, EPSILON): 131 return alphaNew, bNew, True 132 133 alpha, b, JVal = alphaNew, bNew, JValNew 134 grad = self.__calc_grad(KAA, Y_, c, epsilon, beta, alpha, b) 135 Hess = self.__calc_Hess(KAA, Y_, c, epsilon, beta, alpha, b) 136 else: 137 if self.__converged1(grad, EPSILON): 138 return alpha, b, True 139 return alpha, b, False 140 141 142 def __converged2(self, delta, JValDelta, EPSILON): 143 val1 = numpy.linalg.norm(delta) 144 val2 = numpy.linalg.norm(JValDelta) 145 if val1 <= EPSILON or val2 <= EPSILON: 146 return True 147 return False 148 149 150 def __calc_ALPHA_by_ArmijoRule(self, alphaCurr, bCurr, JCurr, gCurr, dCurr, KAA, Y_, c, epsilon, beta, C=1.e-4, v=0.5): 151 i = 0 152 ALPHA = v ** i 153 delta = ALPHA * dCurr 154 alphaNext = alphaCurr + delta[:-1, :] 155 bNext = bCurr + delta[-1, -1] 156 JNext = self.__calc_JVal(KAA, Y_, c, epsilon, beta, alphaNext, bNext) 157 while True: 158 if JNext <= JCurr + C * ALPHA * numpy.matmul(dCurr.T, gCurr)[0, 0]: break 159 i += 1 160 ALPHA = v ** i 161 delta = ALPHA * dCurr 162 alphaNext = alphaCurr + delta[:-1, :] 163 bNext = bCurr + delta[-1, -1] 164 JNext = self.__calc_JVal(KAA, Y_, c, epsilon, beta, alphaNext, bNext) 165 return ALPHA 166 167 168 def __converged1(self, grad, EPSILON): 169 if numpy.linalg.norm(grad) <= EPSILON: 170 return True 171 return False 172 173 174 def __calc_Hess(self, KAA, Y_, c, epsilon, beta, alpha, b): 175 Hess_J1 = self.__calc_Hess_J1(alpha) 176 Hess_J2 = self.__calc_Hess_J2(KAA, Y_, c, epsilon, beta, alpha, b) 177 Hess = Hess_J1 + Hess_J2 178 return Hess 179 180 181 def __calc_Hess_J2(self, KAA, Y_, c, epsilon, beta, alpha, b): 182 Hess_J2_alpha_alpha = numpy.zeros((KAA.shape[0], KAA.shape[0])) 183 Hess_J2_alpha_b = numpy.zeros((KAA.shape[0], 1)) 184 Hess_J2_b_alpha = numpy.zeros((1, KAA.shape[0])) 185 Hess_J2_b_b = 0 186 187 z = Y_ - numpy.matmul(KAA, alpha) - b 188 term1 = z - epsilon 189 term2 = -z - epsilon 190 for i in range(z.shape[0]): 191 term3 = self.__s(term1[i, 0], beta) ** 2 + self.__p(term1[i, 0], beta) * self.__d(term1[i, 0], beta) 192 term4 = self.__s(term2[i, 0], beta) ** 2 + self.__p(term2[i, 0], beta) * self.__d(term2[i, 0], beta) 193 term5 = term3 + term4 194 Hess_J2_alpha_alpha += term5 * numpy.matmul(KAA[:, i:i+1], KAA[i:i+1, :]) 195 Hess_J2_alpha_b += term5 * KAA[:, i:i+1] 196 Hess_J2_b_b += term5 197 Hess_J2_alpha_alpha *= c 198 Hess_J2_alpha_b *= c 199 Hess_J2_b_alpha = Hess_J2_alpha_b.reshape((1, -1)) 200 Hess_J2_b_b *= c 201 202 Hess_J2_upper = numpy.hstack((Hess_J2_alpha_alpha, Hess_J2_alpha_b)) 203 Hess_J2_lower = numpy.hstack((Hess_J2_b_alpha, [[Hess_J2_b_b]])) 204 Hess_J2 = numpy.vstack((Hess_J2_upper, Hess_J2_lower)) 205 return Hess_J2 206 207 208 def __calc_Hess_J1(self, alpha): 209 I = numpy.identity(alpha.shape[0]) 210 term = numpy.hstack((I, numpy.zeros((I.shape[0], 1)))) 211 Hess_J1 = numpy.vstack((term, numpy.zeros((1, term.shape[1])))) 212 return Hess_J1 213 214 215 def __calc_grad(self, KAA, Y_, c, epsilon, beta, alpha, b): 216 grad_J1 = self.__calc_grad_J1(alpha) 217 grad_J2 = self.__calc_grad_J2(KAA, Y_, c, epsilon, beta, alpha, b) 218 grad = grad_J1 + grad_J2 219 return grad 220 221 222 def __calc_grad_J2(self, KAA, Y_, c, epsilon, beta, alpha, b): 223 grad_J2_alpha = numpy.zeros((KAA.shape[0], 1)) 224 grad_J2_b = 0 225 226 z = Y_ - numpy.matmul(KAA, alpha) - b 227 term1 = z - epsilon 228 term2 = -z - epsilon 229 for i in range(z.shape[0]): 230 term3 = self.__p(term1[i, 0], beta) * self.__s(term1[i, 0], beta) - self.__p(term2[i, 0], beta) * self.__s(term2[i, 0], beta) 231 grad_J2_alpha += term3 * KAA[:, i:i+1] 232 grad_J2_b += term3 233 grad_J2_alpha *= -c 234 grad_J2_b *= -c 235 236 grad_J2 = numpy.vstack((grad_J2_alpha, [[grad_J2_b]])) 237 return grad_J2 238 239 240 def __calc_grad_J1(self, alpha): 241 grad_J1 = numpy.vstack((alpha, [[0]])) 242 return grad_J1 243 244 245 def __calc_JVal(self, KAA, Y_, c, epsilon, beta, alpha, b): 246 J1 = self.__calc_J1(alpha) 247 J2 = self.__calc_J2(KAA, Y_, c, epsilon, beta, alpha, b) 248 JVal = J1 + J2 249 return JVal 250 251 252 def __calc_J2(self, KAA, Y_, c, epsilon, beta, alpha, b): 253 z = Y_ - numpy.matmul(KAA, alpha) - b 254 term2 = numpy.sum(self.__p_epsilon_square(item[0], epsilon, beta) for item in z) 255 J2 = term2 * c / 2 256 return J2 257 258 259 def __calc_J1(self, alpha): 260 J1 = numpy.sum(alpha * alpha) / 2 261 return J1 262 263 264 def __p(self, x, beta): 265 val = numpy.log(numpy.exp(beta * x) + 1) / beta 266 return val 267 268 269 def __s(self, x, beta): 270 val = 1 / (numpy.exp(-beta * x) + 1) 271 return val 272 273 274 def __d(self, x, beta): 275 term = numpy.exp(beta * x) 276 val = beta * term / (term + 1) ** 2 277 return val 278 279 280 def __p_epsilon_square(self, x, epsilon, beta): 281 term1 = self.__p(x - epsilon, beta) ** 2 282 term2 = self.__p(-x - epsilon, beta) ** 2 283 val = term1 + term2 284 return val 285 286 287 def __get_KAA(self, A, mu): 288 KAA = numpy.zeros((A.shape[1], A.shape[1])) 289 for rowIdx in range(KAA.shape[0]): 290 for colIdx in range(rowIdx + 1): 291 x1 = A[:, rowIdx:rowIdx+1] 292 x2 = A[:, colIdx:colIdx+1] 293 val = self.__calc_gaussian(x1, x2, mu) 294 KAA[rowIdx, colIdx] = KAA[colIdx, rowIdx] = val 295 return KAA 296 297 298 def __calc_gaussian(self, x1, x2, mu): 299 val = numpy.exp(-mu * numpy.linalg.norm(x1 - x2) ** 2) 300 # val = numpy.sum(x1 * x2) 301 return val 302 303 304 def __init_alpha_b(self, shape): 305 ''' 306 alpha、b之初始化 307 ''' 308 alpha, b = numpy.zeros(shape), 0 309 return alpha, b 310 311 312 def __calc_hVal(self, KAx, alpha, b): 313 hVal = numpy.matmul(alpha.T, KAx)[0, 0] + b 314 return hVal 315 316 317 def __get_KAx(self, A, x, mu): 318 KAx = numpy.zeros((A.shape[1], 1)) 319 for rowIdx in range(KAx.shape[0]): 320 x1 = A[:, rowIdx:rowIdx+1] 321 val = self.__calc_gaussian(x1, x, mu) 322 KAx[rowIdx, 0] = val 323 return KAx 324 325 326 327 class PeaksPlot(object): 328 329 def peaks_plot(self, X, Y_, ssvrObj, alpha, b): 330 surfX1 = numpy.linspace(numpy.min(X[:, 0]), numpy.max(X[:, 0]), 50) 331 surfX2 = numpy.linspace(numpy.min(X[:, 1]), numpy.max(X[:, 1]), 50) 332 surfX1, surfX2 = numpy.meshgrid(surfX1, surfX2) 333 surfY_ = peaks(surfX1, surfX2) 334 335 surfY = numpy.zeros(surfX1.shape) 336 for rowIdx in range(surfY.shape[0]): 337 for colIdx in range(surfY.shape[1]): 338 surfY[rowIdx, colIdx] = ssvrObj.get_estimation((surfX1[rowIdx, colIdx], surfX2[rowIdx, colIdx]), alpha, b) 339 340 fig = plt.figure(figsize=(10, 3)) 341 ax1 = plt.subplot(1, 2, 1, projection="3d") 342 ax2 = plt.subplot(1, 2, 2, projection="3d") 343 344 norm = plt.Normalize(surfY_.min(), surfY_.max()) 345 colors = plt.cm.rainbow(norm(surfY_)) 346 surf = ax1.plot_surface(surfX1, surfX2, surfY_, facecolors=colors, shade=True, alpha=0.5) 347 surf.set_facecolor("white") 348 ax1.scatter3D(X[:, 0:1], X[:, 1:2], Y_, s=1, c="r") 349 ax1.set(xlabel="$x_1$", ylabel="$x_2$", zlabel="$f(x_1, x_2)$", xlim=(-3, 3), ylim=(-3, 3), zlim=(-8, 8)) 350 ax1.set(title="Original Function") 351 ax1.view_init(0, 0) 352 353 norm2 = plt.Normalize(surfY.min(), surfY.max()) 354 colors2 = plt.cm.rainbow(norm2(surfY)) 355 surf2 = ax2.plot_surface(surfX1, surfX2, surfY, facecolors=colors2, shade=True, alpha=0.5) 356 surf2.set_facecolor("white") 357 ax2.scatter3D(X[:, 0:1], X[:, 1:2], Y_, s=1, c="r") 358 ax2.set(xlabel="$x_1$", ylabel="$x_2$", zlabel="$f(x_1, x_2)$", xlim=(-3, 3), ylim=(-3, 3), zlim=(-8, 8)) 359 ax2.set(title="Estimated Function") 360 ax2.view_init(0, 0) 361 362 fig.tight_layout() 363 fig.savefig("peaks_plot.png", dpi=100) 364 365 366 367 if __name__ == "__main__": 368 ssvrObj = SSVR(X, Yerr, c=50, mu=1, epsilon=0.5) 369 alpha, b, tab = ssvrObj.optimize() 370 371 ppObj = PeaksPlot() 372 ppObj.peaks_plot(X, Yerr, ssvrObj, alpha, b)
- 结果展示:
左侧为样本点分布及原始函数图像, 右侧为样本点分布及估计函数图像. - 使用建议:
①. 极大化margin可使管道投影加粗, 进一步使回归样本点尽可能落在管道投影内部;
②. SSVR本质上求解的仍然是原问题, 此时$\alpha^{\prime}$仅作为变数变换引入, 无需满足KKT条件中对偶可行性;
③. 数据集是否需要归一化, 应按实际情况予以考虑. - 参考文档:
Yuh-Jye Lee, Wen-Feng Hsieh, and Chien-Ming Huang. "$\epsilon$-SSVR: A Smooth Support Vector Machine for $\epsilon$-Insensitive Regression", IEEE Transactions on Knowledge and Data Engineering, VoL. 17, No. 5, (2005), 678-685.
原文地址:https://www.cnblogs.com/xxhbdk/p/12425701.html
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- Android抓包总结-HTTPS单向认证&双向认证突破
- 2020 ISG“观安杯”最高分值web题的解题思路大放送
- 详解 JS 压缩图片
- LeetCode 1553. Minimum Number of Days to Eat N Oranges
- 异步IO数据库队列缓存
- markdown转为pdf文件
- [已解决]报错:Required request body is missing
- jupyter notebook修改默认路径和浏览器
- python selenium while 循环
- implicitly_wait()隐式等待
- [已解决]python FileNotFoundError: [WinError 3] for getsize(filepath)
- [已解决]ValueError: row index was 65536, not allowed by .xls format
- 记一次由Redis分布式锁造成的重大事故,避免以后踩坑!
- ES6部分源码重写 -1(ES5-构造函数解析)
- ES6部分源码重写 -2(ES6-构造函数初步解析)