新的征程-深度学习

概述

迟早有一天我总要走入这良夜，温柔也罢，强势也罢，总归是要走向这一步的。听说Keras包可以进行简单的深度学习，暂时不用再入python的深海，总归要看看的 上面说的话都没啥用，测试之后太复杂，还是老老实实学python吧 如果有人r测试成功，教教我

Keras官网：https://keras.rstudio.com/ Keras 是一个Python 深度学习框架，可以方便地定义和训练几乎所有类型的深度学习模型。

代码

安装及调试

# 安装tensorflow
# Requires the latest pip
pip3 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple

# Current stable release for CPU and GPU
pip3 install tensorflow -i https://pypi.tuna.tsinghua.edu.cn/simple

# 安装kersa
pip3 install keras -i https://pypi.tuna.tsinghua.edu.cn/simple

MNIST 数据集手写数字的识别

MNIST数据库包含60,000张训练图像和10,000张测试图像。图片由28 x 28灰度的手写数字图像组成，并且每一张图片均对应着标签。

准备数据

# 导入keras模块的mist数据集
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# 分别获得训练集和测试集的样本和标签
# 图像被编码为Numpy 数组，而标签是数字数组
# 查看结构
train_images.shape
# (60000, 28, 28)
# 查看labels长度，为6万个
len(train_labels)
# 60000
train_labels
# array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

建模

# 导入模型
from keras import models
# 导入神经网络层
from keras import layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

# 模型编译
# 优化器（optimizer）：基于训练数据和损失函数来更新网络的机制
# 损失函数（loss function）：网络如何衡量在训练数据上的性能，即网络如何朝着正确的方向前进
# 指标（metric）优化目标
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

# 图像预处理
# 原始的数据为(60000, 28, 28)，转换为(60000, 28 * 28)
# 相当于扁平化处理
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

# 准备标签
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# 拟合模型
network.fit(train_images, train_labels, epochs=5, batch_size=128)
# 模型会随着时间输出两个指标，一个是损失一个是准确度
# 最终的模型精确度为0.9894
# Epoch 1/5
# 60000/60000 [==============================] - 2s 25us/step - loss: 0.2538 - accuracy: 0.9267
# Epoch 2/5
# 60000/60000 [==============================] - 1s 23us/step - loss: 0.1030 - accuracy: 0.9690
# Epoch 3/5
# 60000/60000 [==============================] - 1s 23us/step - loss: 0.0677 - accuracy: 0.9795
# Epoch 4/5
# 60000/60000 [==============================] - 1s 24us/step - loss: 0.0497 - accuracy: 0.9845
# Epoch 5/5
# 60000/60000 [==============================] - 1s 23us/step - loss: 0.0361 - accuracy: 0.9894

# 测试集评估
test_loss, test_acc = network.evaluate(test_images, test_labels)

print('test_acc:', test_acc)
# 最终测试集的准确度为0.9764

结束语

本来想通过r语言来实现这个分析，可惜的是调试不好环境，就这样吧，人间事儿常难遂人愿。本文的代码是基于linux平台和python3进行的初步实践，所以好多细节没有进行解释，另外由于没有Rmarkdown的优秀编辑平台，所以结果以注释的方式显示，后续等我搭建好Jupyter Notebook，阅读效果应该会好很多。