keras 实现轻量级网络ShuffleNet教程
ShuffleNet是由旷世发表的一个计算效率极高的CNN架构,它是专门为计算能力非常有限的移动设备(例如,10-150 MFLOPs)而设计的。该结构利用组卷积和信道混洗两种新的运算方法,在保证计算精度的同时,大大降低了计算成本。ImageNet分类和MS COCO对象检测实验表明,在40 MFLOPs的计算预算下,ShuffleNet的性能优于其他结构,例如,在ImageNet分类任务上,ShuffleNet的top-1 error 7.8%比最近的MobileNet低。在基于arm的移动设备上,ShuffleNet比AlexNet实际加速了13倍,同时保持了相当的准确性。
Paper:ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile
Github:https://github.com/zjn-ai/ShuffleNet-keras
网络架构
组卷积
组卷积其实早在AlexNet中就用过了,当时因为GPU的显存不足因而利用组卷积分配到两个GPU上训练。简单来讲,组卷积就是将输入特征图按照通道方向均分成多个大小一致的特征图,如下图所示左面是输入特征图右面是均分后的特征图,然后对得到的每一个特征图进行正常的卷积操作,最后将输出特征图按照通道方向拼接起来就可以了。
目前很多框架都支持组卷积,但是tensorflow真的不知道在想什么,到现在还是不支持组卷积,只能自己写,因此效率肯定不及其他框架原生支持的方法。组卷积层的代码编写思路就与上面所说的原理完全一致,代码如下。
def _group_conv(x, filters, kernel, stride, groups):
"""
Group convolution
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
# Returns
Output tensor
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
in_channels = K.int_shape(x)[channel_axis]
# number of input channels per group
nb_ig = in_channels // groups
# number of output channels per group
nb_og = filters // groups
gc_list = []
# Determine whether the number of filters is divisible by the number of groups
assert filters % groups == 0
for i in range(groups):
if channel_axis == -1:
x_group = Lambda(lambda z: z[:, :, :, i * nb_ig: (i + 1) * nb_ig])(x)
else:
x_group = Lambda(lambda z: z[:, i * nb_ig: (i + 1) * nb_ig, :, :])(x)
gc_list.append(Conv2D(filters=nb_og, kernel_size=kernel, strides=stride,
padding='same', use_bias=False)(x_group))
return Concatenate(axis=channel_axis)(gc_list)
通道混洗
通道混洗是这篇paper的重点,尽管组卷积大量减少了计算量和参数,但是通道之间的信息交流也受到了限制因而模型精度肯定会受到影响,因此作者提出通道混洗,在不增加参数量和计算量的基础上加强通道之间的信息交流,如下图所示。
通道混洗层的代码实现很巧妙参考了别人的实现方法。通过下面的代码说明,d代表特征图的通道序号,x是经过通道混洗后的通道顺序。
d = np.array([0,1,2,3,4,5,6,7,8])
x = np.reshape(d, (3,3))
x = np.transpose(x, [1,0]) # 转置
x = np.reshape(x, (9,)) # 平铺
'[0 1 2 3 4 5 6 7 8] -- [0 3 6 1 4 7 2 5 8]'
利用keras后端实现代码:
def _channel_shuffle(x, groups):
"""
Channel shuffle layer
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
groups: Integer, number of groups per channel
# Returns
Shuffled tensor
"""
if K.image_data_format() == 'channels_last':
height, width, in_channels = K.int_shape(x)[1:]
channels_per_group = in_channels // groups
pre_shape = [-1, height, width, groups, channels_per_group]
dim = (0, 1, 2, 4, 3)
later_shape = [-1, height, width, in_channels]
else:
in_channels, height, width = K.int_shape(x)[1:]
channels_per_group = in_channels // groups
pre_shape = [-1, groups, channels_per_group, height, width]
dim = (0, 2, 1, 3, 4)
later_shape = [-1, in_channels, height, width]
x = Lambda(lambda z: K.reshape(z, pre_shape))(x)
x = Lambda(lambda z: K.permute_dimensions(z, dim))(x)
x = Lambda(lambda z: K.reshape(z, later_shape))(x)
return x
ShuffleNet Unit
ShuffleNet的主要构成单元。下图中,a图为深度可分离卷积的基本架构,b图为1步长时用的单元,c图为2步长时用的单元。
ShuffleNet架构
注意,对于第二阶段(Stage2),作者没有在第一个1×1卷积上应用组卷积,因为输入通道的数量相对较少。
环境
Python 3.6
Tensorlow 1.13.1
Keras 2.2.4
实现
支持channel first或channel last
# -*- coding: utf-8 -*-
"""
Created on Thu Apr 25 18:26:41 2019
@author: zjn
"""
import numpy as np
from keras.callbacks import LearningRateScheduler
from keras.models import Model
from keras.layers import Input, Conv2D, Dropout, Dense, GlobalAveragePooling2D, Concatenate, AveragePooling2D
from keras.layers import Activation, BatchNormalization, add, Reshape, ReLU, DepthwiseConv2D, MaxPooling2D, Lambda
from keras.utils.vis_utils import plot_model
from keras import backend as K
from keras.optimizers import SGD
def _group_conv(x, filters, kernel, stride, groups):
"""
Group convolution
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
# Returns
Output tensor
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
in_channels = K.int_shape(x)[channel_axis]
# number of input channels per group
nb_ig = in_channels // groups
# number of output channels per group
nb_og = filters // groups
gc_list = []
# Determine whether the number of filters is divisible by the number of groups
assert filters % groups == 0
for i in range(groups):
if channel_axis == -1:
x_group = Lambda(lambda z: z[:, :, :, i * nb_ig: (i + 1) * nb_ig])(x)
else:
x_group = Lambda(lambda z: z[:, i * nb_ig: (i + 1) * nb_ig, :, :])(x)
gc_list.append(Conv2D(filters=nb_og, kernel_size=kernel, strides=stride,
padding='same', use_bias=False)(x_group))
return Concatenate(axis=channel_axis)(gc_list)
def _channel_shuffle(x, groups):
"""
Channel shuffle layer
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
groups: Integer, number of groups per channel
# Returns
Shuffled tensor
"""
if K.image_data_format() == 'channels_last':
height, width, in_channels = K.int_shape(x)[1:]
channels_per_group = in_channels // groups
pre_shape = [-1, height, width, groups, channels_per_group]
dim = (0, 1, 2, 4, 3)
later_shape = [-1, height, width, in_channels]
else:
in_channels, height, width = K.int_shape(x)[1:]
channels_per_group = in_channels // groups
pre_shape = [-1, groups, channels_per_group, height, width]
dim = (0, 2, 1, 3, 4)
later_shape = [-1, in_channels, height, width]
x = Lambda(lambda z: K.reshape(z, pre_shape))(x)
x = Lambda(lambda z: K.permute_dimensions(z, dim))(x)
x = Lambda(lambda z: K.reshape(z, later_shape))(x)
return x
def _shufflenet_unit(inputs, filters, kernel, stride, groups, stage, bottleneck_ratio=0.25):
"""
ShuffleNet unit
# Arguments
inputs: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
stage: Integer, stage number of ShuffleNet
bottleneck_channels: Float, bottleneck ratio implies the ratio of bottleneck channels to output channels
# Returns
Output tensor
# Note
For Stage 2, we(authors of shufflenet) do not apply group convolution on the first pointwise layer
because the number of input channels is relatively small.
"""
channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
in_channels = K.int_shape(inputs)[channel_axis]
bottleneck_channels = int(filters * bottleneck_ratio)
if stage == 2:
x = Conv2D(filters=bottleneck_channels, kernel_size=kernel, strides=1,
padding='same', use_bias=False)(inputs)
else:
x = _group_conv(inputs, bottleneck_channels, (1, 1), 1, groups)
x = BatchNormalization(axis=channel_axis)(x)
x = ReLU()(x)
x = _channel_shuffle(x, groups)
x = DepthwiseConv2D(kernel_size=kernel, strides=stride, depth_multiplier=1,
padding='same', use_bias=False)(x)
x = BatchNormalization(axis=channel_axis)(x)
if stride == 2:
x = _group_conv(x, filters - in_channels, (1, 1), 1, groups)
x = BatchNormalization(axis=channel_axis)(x)
avg = AveragePooling2D(pool_size=(3, 3), strides=2, padding='same')(inputs)
x = Concatenate(axis=channel_axis)([x, avg])
else:
x = _group_conv(x, filters, (1, 1), 1, groups)
x = BatchNormalization(axis=channel_axis)(x)
x = add([x, inputs])
return x
def _stage(x, filters, kernel, groups, repeat, stage):
"""
Stage of ShuffleNet
# Arguments
x: Tensor, input tensor of with `channels_last` or 'channels_first' data format
filters: Integer, number of output channels
kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.
strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.
Can be a single integer to specify the same value for
all spatial dimensions.
groups: Integer, number of groups per channel
repeat: Integer, total number of repetitions for a shuffle unit in every stage
stage: Integer, stage number of ShuffleNet
# Returns
Output tensor
"""
x = _shufflenet_unit(x, filters, kernel, 2, groups, stage)
for i in range(1, repeat):
x = _shufflenet_unit(x, filters, kernel, 1, groups, stage)
return x
def ShuffleNet(input_shape, classes):
"""
ShuffleNet architectures
# Arguments
input_shape: An integer or tuple/list of 3 integers, shape
of input tensor
k: Integer, number of classes to predict
# Returns
A keras model
"""
inputs = Input(shape=input_shape)
x = Conv2D(24, (3, 3), strides=2, padding='same', use_bias=True, activation='relu')(inputs)
x = MaxPooling2D(pool_size=(3, 3), strides=2, padding='same')(x)
x = _stage(x, filters=384, kernel=(3, 3), groups=8, repeat=4, stage=2)
x = _stage(x, filters=768, kernel=(3, 3), groups=8, repeat=8, stage=3)
x = _stage(x, filters=1536, kernel=(3, 3), groups=8, repeat=4, stage=4)
x = GlobalAveragePooling2D()(x)
x = Dense(classes)(x)
predicts = Activation('softmax')(x)
model = Model(inputs, predicts)
return model
if __name__ == '__main__':
model = ShuffleNet((224, 224, 3), 1000)
#plot_model(model, to_file='ShuffleNet.png', show_shapes=True)
以上这篇keras 实现轻量级网络ShuffleNet教程就是小编分享给大家的全部内容了,希望能给大家一个参考。
- return的值都去哪了?去哪了,“谁伸手了,return的结果就给谁”
- 面试时对方问你,“xxx需求你是怎么做的”?你可以这样回答
- vue.js的条件渲染,其实就是模板里面写if else
- vue.js的插槽 - slot 是啥?要我说,它就是个“形参”
- Vue2.0,lifeCycle ['laɪfˌsaɪkl] -- 生命周期大白话~
- 什么生命周期,在我看来就是各种回调 &&电商项目作业检查 -- 张xx
- 小知识点 -- nodejs中的console.log打印输出在哪里?
- 学js少看书肯定是不成的,要多看。
- 抽象是啥?就是一群人的特征;js中的call是啥?就是我想用你家的电饭锅
- 从node事件到观察者 -- 学习要有一根线索
- Joy:一款用于捕获和分析网络内部流量数据的工具
- 老尚,能讲讲闭包么?“可以,没问题,马上”
- PHP代码安全杂谈
- angularJs,请问vue是你失散多年的亲人吗?
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- Flutter Widgets 之 Wrap
- Flutter Widgets 之 Expanded和Flexible
- 可能是Asp.net Core On host、 docker、kubernetes(K8s) 配置读取的最佳实践
- Flutter Widgets 之 AnimatedContainer
- Flutter Widgets 之 Opacity 和AnimatedOpacity
- 聊聊常见的服务(接口)认证授权
- Flutter Widgets 之 FutureBuilder
- [Hei-Ocelot-Gateway ].Net Core Api网关Ocelot的开箱即用版本
- Flutter Widgets 之 InkWell 和 Ink
- Flutter Widgets 之 BottomNavigationBar
- Flutter Widgets 之 PageView
- 笔试题:代码如何实现“百钱买百鸡”?
- Flutter Widgets 之 Dialog 对话框
- 全网最详细的一篇Flutter 尺寸限制类容器总结
- 一篇带你看懂Flutter叠加组件Stack