Your Guide to DL with MLSQL Stack (3)
This is the third article of Your Guide with MLSQL Stack series. We hope this article series shows you how MLSQL stack helps people do AI job.
As we have seen in the previous posts that MLSQL stack give you the power to use the built-in Algorithms and Python ML frameworks. The ability to use Python ML framework means you are totally free to use Deep Learning tools like PyTorch, Tensorflow. But this time, we will teach you how to use built-in DL framework called BigDL to accomplish image classification task first.
Requirements
This guide requires MLSQL Stack 1.3.0-SNAPSHOT. You can setup MLSQL stack with following links. We recommend you deploy MLSQL stack in local.
If you meet any problem when deploying, please let me know and feel free to address any issue in this link.
Project Structure
I have created a project named store1, and there is a directory called image_classify contains all mlsql script we talk today. It looks like this:
image.png
We will teach you how to build the project step by step.
Upload Image
First, download cifar10 raw images from url: https://github.com/allwefantasy/spark-deep-learning-toy/releases/download/v0.01/cifar.tgz ungzip it and make sure it's a tar file.
Though MLSQL Console supports directory uploading, but the huge number of files in the directory will crash the uploading component in the web page, and of course, we hope we can fix this issue in future. Now, there is one way that packaging the directory as a tar file to walk around this uploading crash issue.
image.png
then save upload tar file to your home:
-- download cifar data from https://github.com/allwefantasy/spark-deep-learning-toy/releases/download/v0.01/cifar.tgz
!fs -mkdir -p /tmp/cifar;
!saveUploadFileToHome /cifar.tar /tmp/cifar;
the console will show the real-time log which indicates that the system is extracting images.
image.png
This may take for a while because there are almost 60000 pictures.
Setup some paths.
We create a env.mlsql which contains variables path related:
set basePath="/tmp/cifar";
set labelMappingPath = "${basePath}/si";
set trainDataPath = "${basePath}/cifar_train_data";
set testDataPath = "${basePath}/cifar_test_data";
set modelPath = "${basePath}/bigdl";
And the other script will include this script to get all these paths.
Resize the pictures
We hope we can resize the images to 28*28, you can achieve it with ET ImageLoaderExt
. Here are how we use it:
include store1.`alg.image_classify.env.mlsql`;
-- {} or {number} is used as parameter holder.
set imageResize='''
run command as ImageLoaderExt.`/tmp/cifar/cifar/{}` where
and code="
def apply(params:Map[String,String]) = {
Resize(28, 28) ->
MatToTensor() -> ImageFrameToSample()
}
"
as {}
''';
-- train should be quoted because it's a keyword.
!imageResize "train" data;
!imageResize test testData;
In the above code, because we need to resize train and test dataset, in order to avoid duplicate code, we wrap the resize code as a command, then use this command to process train and test dataset separately.
Extract label
For example, When we see the following path we know that this picture contains frog. So we should extract frog from the path.
/tmp/cifar/cifar/train/38189_frog.png
Again, we wrap the SQL as a command and process the train and test data separately.
set extractLabel='''
-- convert image path to number label
select split(split(imageName,"_")[1],"\.")[0] as labelStr,features from {} as {}
''';
!extractLabel data newdata;
!extractLabel testData newTestData;
We will convert the label to number and then plus 1(cause the bigdl needs the label starts from 1 instead of 0).
set numericLabel='''
train {0} as StringIndex.`/tmp/cifar/si` where inputCol="labelStr" and outputCol="labelIndex" as newdata1;
predict {0} as StringIndex.`/tmp/cifar/si` as newdata2;
select (cast(labelIndex as int) + 1) as label,features from newdata2 as {1}
''';
!numericLabel newdata trainData;
!numericLabel newTestData testData;
Save what we get until now
We will save all these data so we can use the processed data in future without executing repeatedly:
save overwrite trainData as parquet.`${trainDataPath}`;
save overwrite testData as parquet.`${testDataPath}`;
Train the images with DL
We create a new script file named classify_train.mlsql, and we should load the data first and convert the label to an array:
include store1.`alg.image_classify.env.mlsql`;
load parquet.`${trainDataPath}` as tmpTrainData;
load parquet.`${testDataPath}` as tmpTestData;
select array(cast(label as float)) as label,features from tmpTrainData as trainData;
select array(cast(label as float)) as label,features from tmpTestData as testData;
finally, we use our algorithm to train them:
train trainData as BigDLClassifyExt.`${modelPath}` where
disableSparkLog = "true"
and fitParam.0.featureSize="[3,28,28]"
and fitParam.0.classNum="10"
and fitParam.0.maxEpoch="300"
-- print evaluate message
and fitParam.0.evaluate.trigger.everyEpoch="true"
and fitParam.0.evaluate.batchSize="1000"
and fitParam.0.evaluate.table="testData"
and fitParam.0.evaluate.methods="Loss,Top1Accuracy"
-- for unbalanced class
-- and fitParam.0.criterion.classWeight="[......]"
and fitParam.0.code='''
def apply(params:Map[String,String])={
val model = Sequential()
model.add(Reshape(Array(3, 28, 28), inputShape = Shape(28, 28, 3)))
model.add(Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5"))
model.add(MaxPooling2D())
model.add(Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5"))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation = "tanh").setName("fc1"))
model.add(Dense(params("classNum").toInt, activation = "softmax").setName("fc2"))
}
'''
;
Int the code block, we use Keras-style code to build our model, and we tell our system some information e.g. how many classes and what's the feature size.
If this training stage takes too long, you can decrease fitParam.0.maxEpoch
to a small value.
The console will print the message when training:
image.png
and finally the validate result:
image.png
Use model command to check the model train history:
!model history /tmp/cifar/bigdl;
Here are the result:
image.png
Register the model as a function
Since we have built our model, now let us learn how to predict the image. First, we load some data:
include store1.`alg.image_classify.env.mlsql`;
load parquet.`${trainDataPath}` as tmpTrainData;
load parquet.`${testDataPath}` as tmpTestData;
select array(cast(label as float)) as label,features from tmpTrainData as trainData;
select array(cast(label as float)) as label,features from tmpTestData as testData;
now, we can register the model as a function:
register BigDLClassifyExt.`${modelPath}` as cifarPredict;
finally, we can use the function to predict a new picture:
select
vec_argmax(cifarPredict(vec_dense(to_array_double(features)))) as predict_label,
label from testData limit 10
as output;
Of course, you can predict a table:
predict testData as BigDLClassifyExt.`${modelPath}` as predictdata;
Why BigDL
GPU is very expensive and normally, our company already have lots of CPUs, if we can make full use of these CPUs which will save a lot of money.
- 3杂再破市场行情 6位数结拍
- 将永久存储添加到Red Hat CDK Kit 3.0
- ASP.NET MVC的Razor引擎:RazorView
- 三分钟学会 Java 单元测试
- 革了短信的命之后,微信开始把枪口对准了应用市场
- 建构微服务的第一步: 微服务哪里来?
- 最新机器学习必备十大入门算法!都在这里了
- ASP.NET MVC的Razor引擎:IoC在View激活过程中的应用
- 深度学习笔记:深度学习在计算机视觉的应用
- 快速添加永久存储到到Minishift / CDK 3
- 张小龙发布2018微信全新计划(内附演讲全文)
- 使用JClouds在Java中获取和发布云服务器
- 利用ASP.NET SiteMap生成与Bootstrap"兼容"菜单
- 埃隆·马斯克强烈推荐的5本书,看完之后他开始改变世界
- MySQL 教程
- MySQL 安装
- MySQL 管理与配置
- MySQL PHP 语法
- MySQL 连接
- MySQL 创建数据库
- MySQL 删除数据库
- MySQL 选择数据库
- MySQL 数据类型
- MySQL 创建数据表
- MySQL 删除数据表
- MySQL 插入数据
- MySQL 查询数据
- MySQL where 子句
- MySQL UPDATE 查询
- MySQL DELETE 语句
- MySQL LIKE 子句
- mysql order by
- Mysql Join的使用
- MySQL NULL 值处理
- MySQL 正则表达式
- MySQL 事务
- MySQL ALTER命令
- MySQL 索引
- MySQL 临时表
- MySQL 复制表
- 查看MySQL 元数据
- MySQL 序列 AUTO_INCREMENT
- MySQL 处理重复数据
- MySQL 及 SQL 注入
- MySQL 导出数据
- MySQL 导入数据
- MYSQL 函数大全
- MySQL Group By 实例讲解
- MySQL Max()函数实例讲解
- mysql count函数实例
- MYSQL UNION和UNION ALL实例
- MySQL IN 用法
- MySQL between and 实例讲解
- 使用Java编写ActiveMQ的队列模式和主题模式
- 安装ActiveMQ
- Spring Cloud Zuul 快速入门
- Spring Cloud 集成 RabbitMQ
- SpringBoot2.x集成Apache Shiro并完成简单的Case开发
- Spring Security权限框架理论与简单Case
- leetcode树之N叉树的前序遍历
- Spring Security 中的 hasRole 和 hasAuthority 有区别吗?
- python常见的import导包技巧
- 真正了解贪心算法,这是一篇精华入门总结...
- MGR修改max_binlog_cache_size参数导致异常
- 【技术创作101训练营】TensorFlow Lite的 GPU 委托(Delegate)加速模型推理
- 弄懂这 5 个问题,拿下 Python 迭代器!
- 1500字,8个问题,彻底理解堆!
- Python画王者荣耀英雄能力雷达图