神经网络思想建立LR模型（DL公开课第二周答案）

LR回顾

LR计算图求导

算法结构

设计一个简单的算法实现判别是否是猫。

用一个神经网络的思想建立一个LR模型，下面这个图解释了为什么LR事实上是一个简单的神经网。

[图片上传失败...(image-4b2c8b-1515499689320)]

Mathematical expression of the algorithm:
For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b tag{1}$$
$$hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})tag{2}$$
$$ mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  log(a^{(i)}) - (1-y^{(i)} )  log(1-a^{(i)})tag{3}$$
The cost is then computed by summing over all training examples:
$$ J = frac{1}{m} sum_{i=1}^m mathcal{L}(a^{(i)}, y^{(i)})tag{6}$$

构建算法的各个部分

建立神经网络的主要步骤是:

定义模型结构(例如输入特性的数量)
初始化模型的参数
循环:

计算当前损失(正向传播)

计算当前梯度(向后传播)

更新参数(梯度下降)

您通常将1-3单独构建并将它们集成到一个我们称为model()的函数中。

工具函数

# GRADED FUNCTION: sigmoiddef sigmoid(z):     
"""     
Compute the sigmoid of z      
Arguments:     z -- A scalar or numpy array of any size.      
Return:     
s -- sigmoid(z)    
 """     
s = 1/(1+np.exp(-z))     
   return s

初始化参数

# GRADED FUNCTION: initialize_with_zeros
def initialize_with_zeros(dim):     
"""     
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.          
Argument:    
dim -- size of the w vector we want (or number of parameters in this case)          
Returns:     
w -- initialized vector of shape (dim, 1)     
b -- initialized scalar (corresponds to the bias)     
"""          
w = np.zeros((dim,1))     
b = 0      
assert(w.shape == (dim, 1))    
assert(isinstance(b, float) or isinstance(b, int))    
    return w, b

向前和向后传播

现在参数已经初始化，可以执行向前和向后传播步骤来学习参数。

Exercise: 实现方法 propagate()计算代价函数和梯度 Hints:

Forward Propagation:

You get X
You compute $A = sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
You calculate the cost function: $J = -frac{1}{m}sum_{i=1}{m}y{(i)}log(a{(i)})+(1-y{(i)})log(1-a^{(i)})$

Here are the two formulas you will be using:

$$ frac{partial J}{partial w} = frac{1}{m}X(A-Y)^Ttag{7}$$
$$ frac{partial J}{partial b} = frac{1}{m} sum_{i=1}^m (a{(i)}-y{(i)})tag{8}$$
# GRADED FUNCTION: propagate
def propagate(w, b, X, Y):     
"""    
 Implement the cost function and its gradient for the propagation explained above      
Arguments:     
w -- weights, a numpy array of size (num_px * num_px * 3, 1)    
 b -- bias, a scalar     
X -- data of size (num_px * num_px * 3, number of examples)     
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)      
Return:     
cost -- negative log-likelihood cost for logistic regression     
dw -- gradient of the loss with respect to w, thus same shape as w     
db -- gradient of the loss with respect to b, thus same shape as b          
Tips:     
- Write your code step by step for the propagation. np.log(), np.dot()     
"""          
m = X.shape[1]        
 # FORWARD PROPAGATION (FROM X TO COST)     
A = sigmoid(np.dot(w.T,X)+b)                             # compute activation    
 cost = -np.sum((Y*np.log(A)+(1-Y)*np.log(1-A)))/m    # compute cost          
# BACKWARD PROPAGATION (TO FIND GRAD)     
dw = np.dot(X,(A-Y).T)/m     
db = np.sum((A-Y))/m    
assert(dw.shape == w.shape)    
assert(db.dtype == float)     
cost = np.squeeze(cost)    
assert(cost.shape == ())          
grads = {"dw": dw,             "db": db}    
    return grads, cost

优化

已经初始化了参数。
也可以计算一个成本函数和它的梯度。
现在，需要使用梯度下降来更新参数。

目标是通过最小化代价函数$J$来学习$w$ 和 $b$。对于$theta$，更新规则是 $ theta = theta - alpha text{ } dtheta$，$alpha$是学习率。

# GRADED FUNCTION: optimize
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):     
"""     
This function optimizes w and b by running a gradient descent algorithm          
Arguments:     
w -- weights, a numpy array of size (num_px * num_px * 3, 1)    
 b -- bias, a scalar     
X -- data of shape (num_px * num_px * 3, number of examples)    
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)     num_iterations -- number of iterations of the optimization loop     
learning_rate -- learning rate of the gradient descent update rule     
print_cost -- True to print the loss every 100 steps         
 Returns:    
 params -- dictionary containing the weights w and bias b     
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function     
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.          
Tips:     
You basically need to write down two steps and iterate through them:         
1) Calculate the cost and the gradient for the current parameters. Use propagate().         
2) Update the parameters using gradient descent rule for w and b.    
 """          
costs = []         
for i in range(num_iterations):           
# Cost and gradient calculation (≈ 1-4 lines of code)         
grads, cost = propagate(w, b, X, Y) 
# Retrieve derivatives from grads         
dw = grads["dw"]        
 db = grads["db"]                
 # update rule (≈ 2 lines of code)        
 w = w-learning_rate*dw         
b = b-learning_rate*db                 
# Record the costs         
if i % 100 == 0:             
costs.append(cost)                 
# Print the cost every 100 training examples         
if print_cost and i % 100 == 0:            
print ("Cost after iteration %i: %f" %(i, cost))          
params = {"w": w,              
"b": b}          
grads = {"dw": dw,             
"db": db}    
    return params, grads, costs

预测

前面的函数将输出学习的w和b，我们可以使用w和b来预测数据集x的标签，实现预测()函数。计算预测有两个步骤:

1、Calculate $hat{Y} = A = sigma(w^T X + b)$

2、Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

# GRADED FUNCTION: predictdef predict(w, b, X):    
 '''     
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)         
 Arguments:    
 w -- weights, a numpy array of size (num_px * num_px * 3, 1)     
b -- bias, a scalar     
X -- data of size (num_px * num_px * 3, number of examples)          
Returns:    
 Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X    
 '''          
m = X.shape[1]     
Y_prediction = np.zeros((1,m))     
w = w.reshape(X.shape[0], 1)         
# Compute vector "A" predicting the probabilities of a cat being present in the picture     
A = sigmoid(np.dot(w.T,X)+b)    
for i in range(A.shape[1]):                 
# Convert probabilities A[0,i] to actual predictions p[0,i]         
if A[0][i]>0.5:             
Y_prediction[0][i]=1                                  
assert(Y_prediction.shape == (1, m))    

    return Y_prediction

合并各个部分组成模型

现在，将通过将所有构建块(在前面部分中实现的函数)组合在一起，以正确的顺序将整个模型构建起来。

# GRADED FUNCTION: model
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):     
"""     
Builds the logistic regression model by calling the function you've implemented previously          Arguments:    
 X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)     Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)     
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)     
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)     
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations          
Returns:     
d -- dictionary containing information about the model.    
 """             
 # initialize parameters with zeros (≈ 1 line of code)     
w, b = initialize_with_zeros(X_train.shape[0])    
# Gradient descent (≈ 1 line of code)     
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)         
# Retrieve parameters w and b from dictionary "parameters"     
w = parameters["w"]     
b = parameters["b"]        
 # Predict test/train set examples (≈ 2 lines of code)     
Y_prediction_test = predict(w, b, X_test)    
Y_prediction_train = predict(w, b, X_train)    
# Print train/test Errors     
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))     print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))           
d = {"costs": costs,        
 "Y_prediction_test": Y_prediction_test,          
 "Y_prediction_train" : Y_prediction_train,           
"w" : w,           
"b" : b,         
"learning_rate" : learning_rate,         
"num_iterations": num_iterations}    

    return d