高维数据 |R语言数据可视化之t-SNE

时间:2022-07-22
本文章向大家介绍高维数据 |R语言数据可视化之t-SNE,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

高维数据可视化之t-SNE算法

t-SNE算法是最近开发的一种降维的非线性算法,也是一种机器学习算法。与PCA一样是非常适合将高维度数据降低至二维或三维的一种方法,不同之处是PCA属于线性降维,不能解释复杂多项式之间的关系,而t-SNE是根据t分布随机领域的嵌入找到数据之间的结构特点。

01

原始数据

#原始数据为iris数据框,是来自鸢尾属、花斑科和维珍属的50朵花的萼片长度和宽度以及花瓣长度和宽度的测量值,包含150行,5个变量的部分数据截图如下:

02

降维处理

> iris_unique<-unique(iris)
#去除重复值
> set.seed(42)
> iris1<-as.matrix(iris_unique[,1:4])
#选取1至4列数据构成矩阵。
> tsne_out<-Rtsne(iris1)
#c++实现Barnes-Hut t-分布式随机邻居嵌入的封装器,
通过设置theta=0.0可以计算出t-SNE的准确值,
降维全靠Rtsne()函数。
> tsne_out
$N
[1] 149
$Y
             [,1]      [,2]
  [1,] -15.794362 -6.776711
  [2,] -18.120432 -6.231470
  [3,] -18.261085 -7.311696
  [4,] -18.520943 -7.087130
  [5,] -15.778549 -7.221474
  [6,] -14.008673 -7.269801
  [7,] -17.893455 -7.813756
  [8,] -16.467086 -6.810327
  [9,] -19.221721 -6.978521
 [10,] -17.803392 -6.466310
 [11,] -14.445038 -6.594416
 [12,] -17.100410 -7.344027
 [13,] -18.414497 -6.464928
 [14,] -19.408668 -7.464577
 [15,] -13.275994 -6.701498
 [16,] -13.149006 -7.122149
 [17,] -13.890901 -6.929687
 [18,] -15.771478 -6.789918
 [19,] -13.700649 -6.432417
 [20,] -14.852568 -7.405885
 [21,] -15.093413 -5.868684
 [22,] -15.141230 -7.327606
 [23,] -17.943436 -8.557303
 [24,] -16.259062 -5.898315
 [25,] -16.997972 -7.907640
 [26,] -17.825508 -5.984987
 [27,] -16.383265 -6.823896
 [28,] -15.438452 -6.609476
 [29,] -15.772932 -6.342601
 [30,] -17.844418 -7.233536
 [31,] -17.907978 -6.804983
 [32,] -15.149819 -5.939007
 [33,] -13.898201 -7.644515
 [34,] -13.421065 -7.250096
 [35,] -17.747953 -6.510444
 [36,] -17.257790 -6.137830
 [37,] -14.564020 -6.003143
 [38,] -16.193628 -7.512172
 [39,] -19.165844 -7.239552
 [40,] -16.147220 -6.566834
 [41,] -16.104934 -7.127514
 [42,] -19.608322 -6.371928
 [43,] -18.891659 -7.666698
 [44,] -15.795457 -7.824185
 [45,] -14.644010 -8.070843
 [46,] -18.380163 -6.474333
 [47,] -14.788982 -7.480182
 [48,] -18.361689 -7.377014
 [49,] -14.677585 -6.777023
 [50,] -16.850026 -6.588789
 [51,]   8.536314  1.331853
 [52,]   7.160654  2.222958
 [53,]   8.845708  1.642313
 [54,]   2.600714  2.678237
 [55,]   7.608379  2.089299
 [56,]   4.581609  3.175618
 [57,]   7.227995  2.925564
 [58,]   1.259600  2.278412
 [59,]   7.658976  1.699396
 [60,]   2.519998  3.124515
 [61,]   1.356245  2.444995
 [62,]   4.855057  2.662532
 [63,]   3.194511  1.282723
 [64,]   6.573616  2.933744
 [65,]   2.654406  1.830627
 [66,]   7.646466  1.447916
 [67,]   4.740423  3.566360
 [68,]   3.545194  2.042330
 [69,]   5.763106  1.039076
 [70,]   2.780636  2.251854
 [71,]   7.123596  4.273298
 [72,]   4.397361  1.856958
 [73,]   8.586749  3.297060
 [74,]   6.268760  2.630621
 [75,]   6.569283  1.767931
 [76,]   7.376409  1.575341
 [77,]   8.456851  1.726707
 [78,]   9.372167  2.305879
 [79,]   5.871143  2.931851
 [80,]   2.242621  1.746196
 [81,]   2.373290  2.304550
 [82,]   2.174285  2.161862
 [83,]   3.296379  2.057194
 [84,]   8.559505  4.180703
 [85,]   4.351523  3.790885
 [86,]   6.254539  3.630555
 [87,]   8.222644  1.732763
 [88,]   5.693089  1.089601
 [89,]   3.861217  2.855112
 [90,]   2.883691  2.648004
 [91,]   3.519009  3.300865
 [92,]   6.352661  2.839296
 [93,]   3.341238  2.093332
 [94,]   1.310473  2.266729
 [95,]   3.544698  2.882386
 [96,]   4.147667  2.748773
 [97,]   4.098212  2.762276
 [98,]   5.786155  2.165470
 [99,]   1.204406  2.147812
[100,]   3.726652  2.550643
[101,]  12.868885  6.006482
[102,]   8.275202  4.992602
[103,]  13.818242  4.538085
[104,]  11.395916  3.733291
[105,]  12.611549  4.518063
[106,]  15.167288  4.580621
[107,]   3.256931  4.343911
[108,]  14.657778  4.221338
[109,]  12.606503  3.429156
[110,]  14.293539  5.431041
[111,]  10.913678  4.848214
[112,]  10.620091  3.985814
[113,]  12.455521  4.586778
[114,]   8.165304  5.279075
[115,]   8.537106  5.643321
[116,]  11.603757  5.419617
[117,]  11.522072  3.952109
[118,]  15.250025  5.271586
[119,]  15.472077  4.396878
[120,]   8.917063  4.108094
[121,]  13.199716  5.045357
[122,]   7.799535  5.271964
[123,]  15.305558  4.422065
[124,]   8.684753  3.689291
[125,]  12.898437  4.948861
[126,]  14.177381  4.339674
[127,]   8.090627  3.742130
[128,]   7.812402  4.082107
[129,]  11.927399  4.059812
[130,]  14.012217  3.959194
[131,]  14.560938  4.206181
[132,]  15.236393  5.268181
[133,]  12.010499  4.206740
[134,]   8.927838  3.385948
[135,]  10.179020  3.447383
[136,]  14.868648  4.698355
[137,]  12.240744  5.918564
[138,]  11.418754  3.982833
[139,]   7.492590  4.141120
[140,]  12.415297  4.801756
[141,]  12.650368  5.203968
[142,]  11.761565  5.179988
[143,]  13.361707  5.127742
[144,]  12.955958  5.557022
[145,]  11.713405  5.048448
[146,]   9.150078  4.010551
[147,]  11.016824  4.559839
[148,]  11.759284  5.917160
[149,]   8.022603  4.638540
$costs
  [1] -6.569291e-05 -1.184407e-04  6.668903e-05 -4.284391e-04  1.331527e-05 -1.659447e-04
  [7]  4.178618e-04  4.000721e-04 -1.471709e-04 -4.866961e-04  1.583424e-04  3.891458e-04
 [13] -2.748767e-04  1.038017e-05 -1.223165e-04 -2.175199e-04 -3.052060e-04 -1.153663e-04
 [19] -5.807774e-06 -1.318655e-05  1.086358e-04 -1.056506e-04  1.042230e-03  1.020565e-03
 [25]  8.145448e-05  8.433778e-05  4.999846e-05  8.485467e-05  1.840803e-04 -3.276016e-04
 [31] -6.551815e-04  1.557659e-04 -4.736972e-06 -4.735442e-05 -2.520546e-04  3.108438e-04
 [37]  1.260465e-04  1.972264e-04 -9.611292e-05  2.441147e-04 -1.997326e-04 -7.527818e-05
 [43]  1.671165e-04  5.584101e-04  4.285945e-04 -2.968739e-04 -1.330250e-05 -1.149360e-04
 [49]  8.084874e-05  6.710403e-04  6.919283e-04  9.766942e-04  1.960186e-03 -1.673100e-04
 [55]  1.498811e-03  1.518804e-03  1.191833e-03  3.496391e-04  8.389463e-04  7.130234e-04
 [61] -5.238504e-05  1.329515e-03  1.051728e-03  2.541084e-03 -1.976349e-04  5.621641e-04
 [67]  1.889274e-03  1.434286e-05  4.027971e-03 -1.166053e-04  1.719625e-03  1.638894e-03
 [73]  2.473756e-03  1.286735e-03  1.833584e-03  8.951562e-04  6.807228e-04  4.984784e-03
 [79]  2.823550e-03  1.089342e-04 -6.601256e-06  6.403744e-05  1.991636e-04  1.613454e-03
 [85]  1.260980e-03  9.531492e-04  1.349265e-03  2.888901e-03 -1.345512e-04  2.987155e-06
 [91]  5.545848e-04  2.071556e-03  3.427912e-04  3.104291e-04  5.055473e-04  2.853883e-05
 [97]  5.868188e-04  2.521142e-03  5.766867e-04  3.513576e-04  3.311204e-04  1.479315e-03
[103]  1.033390e-03  1.737451e-03  5.759593e-04  2.587490e-04  1.289787e-03  4.355705e-04
[109]  1.261498e-03  4.912577e-04  3.050430e-03  3.313013e-03  9.174032e-04  1.021039e-03
[115]  1.433732e-03  1.278868e-03  1.815971e-03  2.059448e-04  5.057746e-05  1.561998e-03
[121]  5.342262e-04  1.744473e-03  2.214039e-04  2.083725e-03  3.547608e-04  5.196140e-04
[127]  1.990998e-03  2.346313e-03  1.098786e-03  8.136133e-04  5.001043e-04  1.644533e-04
[133]  6.233415e-04  2.139292e-03  8.210858e-04 -8.386480e-05  7.858205e-04  1.427453e-03
[139]  2.148709e-03  6.094865e-04  1.929748e-04 -8.357979e-05  6.223272e-04  3.127318e-04
[145]  2.927624e-04  1.391081e-03  3.127062e-03  1.176773e-03  1.645225e-03
$itercosts
 [1] 43.7514985 44.7873147 44.8116650 44.3887944 45.7282669  0.3704256  0.1252816  0.1237133
 [9]  0.1217102  0.1200852  0.1187576  0.1161445  0.1173155  0.1144428  0.1127897  0.1122483
[17]  0.1129056  0.1116092  0.1111795  0.1105687
$origD
[1] 4
$perplexity
[1] 30
$theta
[1] 0.5
$max_iter
[1] 1000
$stop_lying_iter
[1] 250
$mom_switch_iter
[1] 250
$momentum
[1] 0.5
$final_momentum
[1] 0.8
$eta
[1] 200
$exaggeration_factor
[1] 12
> data<-data.frame(tsne_out$Y,iris_unique$Species)
> data
            X1        X2 iris_unique.Species
1   -15.794362 -6.776711              setosa
2   -18.120432 -6.231470              setosa
3   -18.261085 -7.311696              setosa
4   -18.520943 -7.087130              setosa
5   -15.778549 -7.221474              setosa
6   -14.008673 -7.269801              setosa
7   -17.893455 -7.813756              setosa
8   -16.467086 -6.810327              setosa
9   -19.221721 -6.978521              setosa
10  -17.803392 -6.466310              setosa
11  -14.445038 -6.594416              setosa
12  -17.100410 -7.344027              setosa
13  -18.414497 -6.464928              setosa
14  -19.408668 -7.464577              setosa
15  -13.275994 -6.701498              setosa
16  -13.149006 -7.122149              setosa
17  -13.890901 -6.929687              setosa
18  -15.771478 -6.789918              setosa
19  -13.700649 -6.432417              setosa
20  -14.852568 -7.405885              setosa
21  -15.093413 -5.868684              setosa
22  -15.141230 -7.327606              setosa
23  -17.943436 -8.557303              setosa
24  -16.259062 -5.898315              setosa
25  -16.997972 -7.907640              setosa
26  -17.825508 -5.984987              setosa
27  -16.383265 -6.823896              setosa
28  -15.438452 -6.609476              setosa
29  -15.772932 -6.342601              setosa
30  -17.844418 -7.233536              setosa
31  -17.907978 -6.804983              setosa
32  -15.149819 -5.939007              setosa
33  -13.898201 -7.644515              setosa
34  -13.421065 -7.250096              setosa
35  -17.747953 -6.510444              setosa
36  -17.257790 -6.137830              setosa
37  -14.564020 -6.003143              setosa
38  -16.193628 -7.512172              setosa
39  -19.165844 -7.239552              setosa
40  -16.147220 -6.566834              setosa
41  -16.104934 -7.127514              setosa
42  -19.608322 -6.371928              setosa
43  -18.891659 -7.666698              setosa
44  -15.795457 -7.824185              setosa
45  -14.644010 -8.070843              setosa
46  -18.380163 -6.474333              setosa
47  -14.788982 -7.480182              setosa
48  -18.361689 -7.377014              setosa
49  -14.677585 -6.777023              setosa
50  -16.850026 -6.588789              setosa
51    8.536314  1.331853          versicolor
52    7.160654  2.222958          versicolor
53    8.845708  1.642313          versicolor
54    2.600714  2.678237          versicolor
55    7.608379  2.089299          versicolor
56    4.581609  3.175618          versicolor
57    7.227995  2.925564          versicolor
58    1.259600  2.278412          versicolor
59    7.658976  1.699396          versicolor
60    2.519998  3.124515          versicolor
61    1.356245  2.444995          versicolor
62    4.855057  2.662532          versicolor
63    3.194511  1.282723          versicolor
64    6.573616  2.933744          versicolor
65    2.654406  1.830627          versicolor
66    7.646466  1.447916          versicolor
67    4.740423  3.566360          versicolor
68    3.545194  2.042330          versicolor
69    5.763106  1.039076          versicolor
70    2.780636  2.251854          versicolor
71    7.123596  4.273298          versicolor
72    4.397361  1.856958          versicolor
73    8.586749  3.297060          versicolor
74    6.268760  2.630621          versicolor
75    6.569283  1.767931          versicolor
76    7.376409  1.575341          versicolor
77    8.456851  1.726707          versicolor
78    9.372167  2.305879          versicolor
79    5.871143  2.931851          versicolor
80    2.242621  1.746196          versicolor
81    2.373290  2.304550          versicolor
82    2.174285  2.161862          versicolor
83    3.296379  2.057194          versicolor
84    8.559505  4.180703          versicolor
85    4.351523  3.790885          versicolor
86    6.254539  3.630555          versicolor
87    8.222644  1.732763          versicolor
88    5.693089  1.089601          versicolor
89    3.861217  2.855112          versicolor
90    2.883691  2.648004          versicolor
91    3.519009  3.300865          versicolor
92    6.352661  2.839296          versicolor
93    3.341238  2.093332          versicolor
94    1.310473  2.266729          versicolor
95    3.544698  2.882386          versicolor
96    4.147667  2.748773          versicolor
97    4.098212  2.762276          versicolor
98    5.786155  2.165470          versicolor
99    1.204406  2.147812          versicolor
100   3.726652  2.550643          versicolor
101  12.868885  6.006482           virginica
102   8.275202  4.992602           virginica
103  13.818242  4.538085           virginica
104  11.395916  3.733291           virginica
105  12.611549  4.518063           virginica
106  15.167288  4.580621           virginica
107   3.256931  4.343911           virginica
108  14.657778  4.221338           virginica
109  12.606503  3.429156           virginica
110  14.293539  5.431041           virginica
111  10.913678  4.848214           virginica
112  10.620091  3.985814           virginica
113  12.455521  4.586778           virginica
114   8.165304  5.279075           virginica
115   8.537106  5.643321           virginica
116  11.603757  5.419617           virginica
117  11.522072  3.952109           virginica
118  15.250025  5.271586           virginica
119  15.472077  4.396878           virginica
120   8.917063  4.108094           virginica
121  13.199716  5.045357           virginica
122   7.799535  5.271964           virginica
123  15.305558  4.422065           virginica
124   8.684753  3.689291           virginica
125  12.898437  4.948861           virginica
126  14.177381  4.339674           virginica
127   8.090627  3.742130           virginica
128   7.812402  4.082107           virginica
129  11.927399  4.059812           virginica
130  14.012217  3.959194           virginica
131  14.560938  4.206181           virginica
132  15.236393  5.268181           virginica
133  12.010499  4.206740           virginica
134   8.927838  3.385948           virginica
135  10.179020  3.447383           virginica
136  14.868648  4.698355           virginica
137  12.240744  5.918564           virginica
138  11.418754  3.982833           virginica
139   7.492590  4.141120           virginica
140  12.415297  4.801756           virginica
141  12.650368  5.203968           virginica
142  11.761565  5.179988           virginica
143  13.361707  5.127742           virginica
144  12.955958  5.557022           virginica
145  11.713405  5.048448           virginica
146   9.150078  4.010551           virginica
147  11.016824  4.559839           virginica
148  11.759284  5.917160           virginica
149   8.022603  4.638540           virginica
> colnames(data)<-c("Y1","Y2","Species")
> data
            Y1        Y2    Species
1   -15.794362 -6.776711     setosa
2   -18.120432 -6.231470     setosa
3   -18.261085 -7.311696     setosa
4   -18.520943 -7.087130     setosa
5   -15.778549 -7.221474     setosa
6   -14.008673 -7.269801     setosa
7   -17.893455 -7.813756     setosa
8   -16.467086 -6.810327     setosa
9   -19.221721 -6.978521     setosa
10  -17.803392 -6.466310     setosa
11  -14.445038 -6.594416     setosa
12  -17.100410 -7.344027     setosa
13  -18.414497 -6.464928     setosa
14  -19.408668 -7.464577     setosa
15  -13.275994 -6.701498     setosa
16  -13.149006 -7.122149     setosa
17  -13.890901 -6.929687     setosa
18  -15.771478 -6.789918     setosa
19  -13.700649 -6.432417     setosa
20  -14.852568 -7.405885     setosa
21  -15.093413 -5.868684     setosa
22  -15.141230 -7.327606     setosa
23  -17.943436 -8.557303     setosa
24  -16.259062 -5.898315     setosa
25  -16.997972 -7.907640     setosa
26  -17.825508 -5.984987     setosa
27  -16.383265 -6.823896     setosa
28  -15.438452 -6.609476     setosa
29  -15.772932 -6.342601     setosa
30  -17.844418 -7.233536     setosa
31  -17.907978 -6.804983     setosa
32  -15.149819 -5.939007     setosa
33  -13.898201 -7.644515     setosa
34  -13.421065 -7.250096     setosa
35  -17.747953 -6.510444     setosa
36  -17.257790 -6.137830     setosa
37  -14.564020 -6.003143     setosa
38  -16.193628 -7.512172     setosa
39  -19.165844 -7.239552     setosa
40  -16.147220 -6.566834     setosa
41  -16.104934 -7.127514     setosa
42  -19.608322 -6.371928     setosa
43  -18.891659 -7.666698     setosa
44  -15.795457 -7.824185     setosa
45  -14.644010 -8.070843     setosa
46  -18.380163 -6.474333     setosa
47  -14.788982 -7.480182     setosa
48  -18.361689 -7.377014     setosa
49  -14.677585 -6.777023     setosa
50  -16.850026 -6.588789     setosa
51    8.536314  1.331853 versicolor
52    7.160654  2.222958 versicolor
53    8.845708  1.642313 versicolor
54    2.600714  2.678237 versicolor
55    7.608379  2.089299 versicolor
56    4.581609  3.175618 versicolor
57    7.227995  2.925564 versicolor
58    1.259600  2.278412 versicolor
59    7.658976  1.699396 versicolor
60    2.519998  3.124515 versicolor
61    1.356245  2.444995 versicolor
62    4.855057  2.662532 versicolor
63    3.194511  1.282723 versicolor
64    6.573616  2.933744 versicolor
65    2.654406  1.830627 versicolor
66    7.646466  1.447916 versicolor
67    4.740423  3.566360 versicolor
68    3.545194  2.042330 versicolor
69    5.763106  1.039076 versicolor
70    2.780636  2.251854 versicolor
71    7.123596  4.273298 versicolor
72    4.397361  1.856958 versicolor
73    8.586749  3.297060 versicolor
74    6.268760  2.630621 versicolor
75    6.569283  1.767931 versicolor
76    7.376409  1.575341 versicolor
77    8.456851  1.726707 versicolor
78    9.372167  2.305879 versicolor
79    5.871143  2.931851 versicolor
80    2.242621  1.746196 versicolor
81    2.373290  2.304550 versicolor
82    2.174285  2.161862 versicolor
83    3.296379  2.057194 versicolor
84    8.559505  4.180703 versicolor
85    4.351523  3.790885 versicolor
86    6.254539  3.630555 versicolor
87    8.222644  1.732763 versicolor
88    5.693089  1.089601 versicolor
89    3.861217  2.855112 versicolor
90    2.883691  2.648004 versicolor
91    3.519009  3.300865 versicolor
92    6.352661  2.839296 versicolor
93    3.341238  2.093332 versicolor
94    1.310473  2.266729 versicolor
95    3.544698  2.882386 versicolor
96    4.147667  2.748773 versicolor
97    4.098212  2.762276 versicolor
98    5.786155  2.165470 versicolor
99    1.204406  2.147812 versicolor
100   3.726652  2.550643 versicolor
101  12.868885  6.006482  virginica
102   8.275202  4.992602  virginica
103  13.818242  4.538085  virginica
104  11.395916  3.733291  virginica
105  12.611549  4.518063  virginica
106  15.167288  4.580621  virginica
107   3.256931  4.343911  virginica
108  14.657778  4.221338  virginica
109  12.606503  3.429156  virginica
110  14.293539  5.431041  virginica
111  10.913678  4.848214  virginica
112  10.620091  3.985814  virginica
113  12.455521  4.586778  virginica
114   8.165304  5.279075  virginica
115   8.537106  5.643321  virginica
116  11.603757  5.419617  virginica
117  11.522072  3.952109  virginica
118  15.250025  5.271586  virginica
119  15.472077  4.396878  virginica
120   8.917063  4.108094  virginica
121  13.199716  5.045357  virginica
122   7.799535  5.271964  virginica
123  15.305558  4.422065  virginica
124   8.684753  3.689291  virginica
125  12.898437  4.948861  virginica
126  14.177381  4.339674  virginica
127   8.090627  3.742130  virginica
128   7.812402  4.082107  virginica
129  11.927399  4.059812  virginica
130  14.012217  3.959194  virginica
131  14.560938  4.206181  virginica
132  15.236393  5.268181  virginica
133  12.010499  4.206740  virginica
134   8.927838  3.385948  virginica
135  10.179020  3.447383  virginica
136  14.868648  4.698355  virginica
137  12.240744  5.918564  virginica
138  11.418754  3.982833  virginica
139   7.492590  4.141120  virginica
140  12.415297  4.801756  virginica
141  12.650368  5.203968  virginica
142  11.761565  5.179988  virginica
143  13.361707  5.127742  virginica
144  12.955958  5.557022  virginica
145  11.713405  5.048448  virginica
146   9.150078  4.010551  virginica
147  11.016824  4.559839  virginica
148  11.759284  5.917160  virginica
149   8.022603  4.638540  virginica

03

ggplot2绘图

>ggplot(data,aes(Y1,Y2,fill=Species))+geom_point(size=5.5,colour="black",alpha=0.6,shape=21)+scale_fill_manual(values=c("#00AFBB","#E7B800","blue"))

小结

Rtsne():给定输入对象之间的距离矩阵D(默认情况下是两个对象之间的欧氏距离),计算原始空间p_ij中的相似度评分,输入对象必须为矩阵!!

t-SNE的局限性:若原始数据本身具有很高的维度,是不可能完整映射到二或三维空间,而且在t-SNE图中,距离本身是没有意义的,涉及概率分布问题。