[深度应用]·主流深度学习硬件速度对比(CPU,GPU,TPU)
时间:2022-06-24
本文章向大家介绍[深度应用]·主流深度学习硬件速度对比(CPU,GPU,TPU),主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
主流深度学习硬件速度对比(CPU,GPU,TPU)
个人主页--> http://www.yansongsong.cn
我们基于CNN实现Cifar10 数据集分类把这段相同的代码在不同主流深度学习进行测试,得到训练速度的对比数据。
主流深度学习硬件速度对比
(Colab TPU) 速度 382s/epoch
(i5 8250u) 速度 320s/epoch
(i7 9700k) 速度 36s/epoch
(GPU MX150) 速度 36s/epoch
(Colab GPU) 速度 16s/epoch
(GPU GTX 1060) 速度 9s/epoch
(GPU GTX1080ti) 速度 4s/epoch
通过对比看出相较于普通比较笔记本的(i5 8250u)CPU,一个入门级显卡(GPU MX150)可以提升8倍左右的速度,而高性能的显卡(GPU GTX1080ti)可以提升80倍的速度,如果采用多个GPU将会获得更快速度,所以经常用于训练的话还是建议使用GPU。
也欢迎大家在自己电脑上运行下面代码,对比一下速度。我的电脑CPU 320s/epoch。 代码部分
from tensorflow import keras
from keras.datasets import cifar10
import numpy as np
batch_size = 100
num_classes = 10
epochs = 10
# 数据载入
(x_train, train_labels), (x_test, test_labels) = cifar10.load_data()
print(x_train.shape)
train_images = x_train.reshape([-1,32,32,3]) / 255.0
test_images = x_test.reshape([-1,32,32,3]) / 255.0
model = keras.Sequential([
#(-1,32,32,3)->(-1,32,32,16)
keras.layers.Conv2D(input_shape=(32, 32, 3),filters=32,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,32,32,32)->(-1,32,32,32)
keras.layers.Conv2D(filters=32,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,32,32,32)->(-1,16,16,32)
keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same'),
#(-1,16,16,32)->(-1,16,16,64)
keras.layers.Conv2D(filters=64,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,16,16,64)->(-1,16,16,64)
keras.layers.Conv2D(filters=64,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,16,16,64)->(-1,8,8,64)
keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same'),
#(-1,8,8,64)->(-1,8*8*128)
keras.layers.Conv2D(filters=128,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,8,8,128)->(-1,8*8*128)
keras.layers.Conv2D(filters=128,kernel_size=3,strides=1,padding='same'), # Padding method),
#(-1,8,8,128)->(-1,8*8*128)
keras.layers.Flatten(),
#(-1,8*8*128)->(-1,256)
keras.layers.Dropout(0.3),
keras.layers.Dense(128, activation="relu"),
#(-1,256)->(-1,10)
keras.layers.Dense(10, activation="softmax")
])
print(model.summary())
model.compile(optimizer="adam",
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, batch_size = batch_size, epochs=epochs,validation_data=[test_images[:1000],test_labels[:1000]])
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(np.argmax(model.predict(test_images[:20]),1),test_labels[:20])
输出结果(GPU gtx 1080 ti)
python demo.py
Using TensorFlow backend.
(50000, 32, 32, 3)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 32) 896
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 32) 9248
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 64) 18496
_________________________________________________________________
conv2d_3 (Conv2D) (None, 16, 16, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 128) 73856
_________________________________________________________________
conv2d_5 (Conv2D) (None, 8, 8, 128) 147584
_________________________________________________________________
flatten (Flatten) (None, 8192) 0
_________________________________________________________________
dropout (Dropout) (None, 8192) 0
_________________________________________________________________
dense (Dense) (None, 128) 1048704
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 1,337,002
Trainable params: 1,337,002
Non-trainable params: 0
_________________________________________________________________
None
Train on 50000 samples, validate on 1000 samples
Epoch 1/10
2019-03-15 17:07:34.477745: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-03-15 17:07:34.552699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-15 17:07:34.553036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.68GiB
2019-03-15 17:07:34.553049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-03-15 17:07:34.737306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-15 17:07:34.737335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-03-15 17:07:34.737340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-03-15 17:07:34.737468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10327 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
50000/50000 [==============================] - 5s 103us/step - loss: 1.3343 - acc: 0.5256 - val_loss: 1.0300 - val_acc: 0.6450
Epoch 2/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.9668 - acc: 0.6660 - val_loss: 0.8930 - val_acc: 0.6820
Epoch 3/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.8349 - acc: 0.7097 - val_loss: 0.8486 - val_acc: 0.7130
Epoch 4/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.7496 - acc: 0.7412 - val_loss: 0.8823 - val_acc: 0.7040
Epoch 5/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.6805 - acc: 0.7643 - val_loss: 0.8710 - val_acc: 0.7060
Epoch 6/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.6256 - acc: 0.7833 - val_loss: 0.9150 - val_acc: 0.7020
Epoch 7/10
50000/50000 [==============================] - 4s 77us/step - loss: 0.5715 - acc: 0.8000 - val_loss: 0.8586 - val_acc: 0.7140
Epoch 8/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.5312 - acc: 0.8143 - val_loss: 0.9455 - val_acc: 0.7030
Epoch 9/10
50000/50000 [==============================] - 4s 77us/step - loss: 0.4878 - acc: 0.8287 - val_loss: 1.0063 - val_acc: 0.7360
Epoch 10/10
50000/50000 [==============================] - 4s 76us/step - loss: 0.4474 - acc: 0.8438 - val_loss: 1.0609 - val_acc: 0.7030
10000/10000 [==============================] - 1s 54us/step
[3 8 8 0 6 6 1 6 3 1 4 9 4 7 9 8 5 5 8 6]
[[3]
[8]
[8]
[0]
[6]
[6]
[1]
[6]
[3]
[1]
[0]
[9]
[5]
[7]
[9]
[8]
[5]
[7]
[8]
[6]]
- ofbiz view渲染处理机制
- ofbiz方法一 条件查询createConditionList
- ofbiz的ant命令创建模块
- 几个不常用但特别实用的PHP预定义变量
- tomcat源码解读六 tomcat中的session生命历程
- tomcat源码解读五 Tomcat中Request的生命历程
- PostQueuedCompletionStatus
- tomcat源码解读四 tomcat中的processer
- tomcat源码解读三(2) tomcat中JMX的源码分析
- 程序的入口
- tomcat源码解读三(1) tomcat的jmx管理
- 利用xinetd实现简单web服务器(镜像站)
- tomcat源码解读二 tomcat的生命周期
- IOCP反射服务器
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 形式化分析工具(五)使用CAS +语法轻松编写HLPSL规范
- 你知道Spring是怎么将AOP应用到Bean的生命周期中的吗?
- 太实用了!自己动手写软件——密码验证器的界面实现
- 【TBase开源版测评】深度测评TBase的shard分片和冷热分离存储特性
- Python爬虫练手,一个简单的Python资讯采集案例
- 直播带货软件开发过程中,如何实现图片上传
- 太实用了!自己动手写软件——邮件用户名密码验证
- 太实用了!自己动手写软件——SSH、FTP和SQL server的密码破解
- Kaggle Tweet Sentiment Extraction 第七名复盘
- 【翻译】.NET 5中的性能改进
- 腾讯云实时语音识别-iOS SDK
- JointPoint用法及与ProceedingJoinPoint 的关系
- Spring中的异步请求、异步调用及demo测试
- 以太坊交易签名解析源码解读
- 比较NaN和数字