神经网络思想建立LR模型(DL公开课第二周答案)
LR回顾
LR计算图求导
算法结构
设计一个简单的算法实现判别是否是猫。
用一个神经网络的思想建立一个LR模型,下面这个图解释了为什么LR事实上是一个简单的神经网。
[图片上传失败...(image-4b2c8b-1515499689320)]
Mathematical expression of the algorithm:
For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b tag{1}$$
$$hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})tag{2}$$
$$ mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} log(a^{(i)}) - (1-y^{(i)} ) log(1-a^{(i)})tag{3}$$
The cost is then computed by summing over all training examples:
$$ J = frac{1}{m} sum_{i=1}^m mathcal{L}(a^{(i)}, y^{(i)})tag{6}$$
构建算法的各个部分
建立神经网络的主要步骤是:
- 定义模型结构(例如输入特性的数量)
- 初始化模型的参数
- 循环:
计算当前损失(正向传播)
计算当前梯度(向后传播)
更新参数(梯度下降)
您通常将1-3单独构建并将它们集成到一个我们称为model()的函数中。
01
工具函数
# GRADED FUNCTION: sigmoiddef sigmoid(z):
"""
Compute the sigmoid of z
Arguments: z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
s = 1/(1+np.exp(-z))
return s
02
初始化参数
# GRADED FUNCTION: initialize_with_zeros
def initialize_with_zeros(dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""
w = np.zeros((dim,1))
b = 0
assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))
return w, b
03
向前和向后传播
现在参数已经初始化,可以执行向前和向后传播步骤来学习参数。
Exercise: 实现方法 propagate()计算代价函数和梯度 Hints:
Forward Propagation:
- You get X
- You compute $A = sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
- You calculate the cost function: $J = -frac{1}{m}sum_{i=1}{m}y{(i)}log(a{(i)})+(1-y{(i)})log(1-a^{(i)})$
Here are the two formulas you will be using:
$$ frac{partial J}{partial w} = frac{1}{m}X(A-Y)^Ttag{7}$$
$$ frac{partial J}{partial b} = frac{1}{m} sum_{i=1}^m (a{(i)}-y{(i)})tag{8}$$
# GRADED FUNCTION: propagate
def propagate(w, b, X, Y):
"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
A = sigmoid(np.dot(w.T,X)+b) # compute activation
cost = -np.sum((Y*np.log(A)+(1-Y)*np.log(1-A)))/m # compute cost
# BACKWARD PROPAGATION (TO FIND GRAD)
dw = np.dot(X,(A-Y).T)/m
db = np.sum((A-Y))/m
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw, "db": db}
return grads, cost
04
优化
- 已经初始化了参数。
- 也可以计算一个成本函数和它的梯度。
- 现在,需要使用梯度下降来更新参数。
目标是通过最小化代价函数$J$来学习$w$ 和 $b$。对于$theta$,更新规则是 $ theta = theta - alpha text{ } dtheta$,$alpha$是学习率。
# GRADED FUNCTION: optimize
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples) num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""
costs = []
for i in range(num_iterations):
# Cost and gradient calculation (≈ 1-4 lines of code)
grads, cost = propagate(w, b, X, Y)
# Retrieve derivatives from grads
dw = grads["dw"]
db = grads["db"]
# update rule (≈ 2 lines of code)
w = w-learning_rate*dw
b = b-learning_rate*db
# Record the costs
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
05
预测
前面的函数将输出学习的w和b,我们可以使用w和b来预测数据集x的标签,实现预测()函数。计算预测有两个步骤:
1、Calculate $hat{Y} = A = sigma(w^T X + b)$
2、Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).
# GRADED FUNCTION: predictdef predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Compute vector "A" predicting the probabilities of a cat being present in the picture
A = sigmoid(np.dot(w.T,X)+b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0][i]>0.5:
Y_prediction[0][i]=1
assert(Y_prediction.shape == (1, m))
return Y_prediction
06
合并各个部分组成模型
现在,将通过将所有构建块(在前面部分中实现的函数)组合在一起,以正确的顺序将整个模型构建起来。
# GRADED FUNCTION: model
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
"""
Builds the logistic regression model by calling the function you've implemented previously Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train) Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""
# initialize parameters with zeros (≈ 1 line of code)
w, b = initialize_with_zeros(X_train.shape[0])
# Gradient descent (≈ 1 line of code)
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)
# Retrieve parameters w and b from dictionary "parameters"
w = parameters["w"]
b = parameters["b"]
# Predict test/train set examples (≈ 2 lines of code)
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
# Print train/test Errors
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
- pyhton-----break语句
- python unittest使用基本过程
- 基于unittest集成你的selenium2测试
- Selenium Webdriver Desired Capabilities
- 在Selenium Webdriver中使用XPath Contains、Sibling函数定位
- Python多线程Selenium跨浏览器测试
- Python Selenium设计模式-POM
- 基于Excel参数化你的Selenium2测试
- 创建你的第一个webdriver python代码
- Python Selenium Webdriver安装手册
- 工具篇 - HTTP协议报文结构及示例03
- 工具篇 - JMeter目录及关键配置分析02
- python unittest之加载及跳过测试方法和示例
- python unittest之异常测试
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 聊聊dubbo-go的DefaultHealthChecker
- Java后端面试学习知识总结
- Spring框架源码脉络分析(一):IoC与容器、Bean和BeanDefinition
- Spring-Data-Redis 2.X以上版本使用心得和一些坑
- Java后端面试学习知识总结——数据库:MySQL
- Java 记一次自定义比较器中compareTo方法使用long强转int作为比较结果产生的bug
- SpringCloud 使用feign报错
- Java 使用Runtime在一个Java程序中启动和关闭另一个Java程序
- 解决虚拟机Centos7 报错 curl#56
- Java 桶排序实现 如何判断该放到哪个桶里
- Java selenium使用ChromeDriver截图 解决get超时后续任务报错问题
- 冒泡排序-排序算法
- Java中JDBC工具类封装
- 3.深入k8s:Deployment控制器
- 使用FreeSurfer进行脑区分割