pytorch每一个batch训练之前需要把hidden = hidden.data，否者反向传播的梯度会遍历以前的timestep

tensorflow也有把new_state更新，但是没有明显detach的操作，预计是tensorflow自己机制默认backpropagation一个timestep的梯度：

    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)

pytorch每一个batch训练之前需要把hidden = hidden.data，否者反向传播的梯度会遍历以前的timestep，它是自动求导，需要专门把那个state提出来一下，这样就相当于detach了，反向梯度到这里就停止了。

# train the RNN
def train(rnn, n_steps, print_every):
    
    # initialize the hidden state
    hidden = None      
    
    for batch_i, step in enumerate(range(n_steps)):
        # defining the training data 
        time_steps = np.linspace(step * np.pi, (step+1)*np.pi, seq_length + 1)
        data = np.sin(time_steps)
        data.resize((seq_length + 1, 1)) # input_size=1

        x = data[:-1]
        y = data[1:]
        
        # convert data into Tensors
        x_tensor = torch.Tensor(x).unsqueeze(0) # unsqueeze gives a 1, batch_size dimension
        y_tensor = torch.Tensor(y)

        # outputs from the rnn
        prediction, hidden = rnn(x_tensor, hidden)

        ## Representing Memory ##
        # make a new variable for hidden and detach the hidden state from its history
        # this way, we don't backpropagate through the entire history
        hidden = hidden.data

        # calculate the loss
        loss = criterion(prediction, y_tensor)
        # zero gradients
        optimizer.zero_grad()
        # perform backprop and update weights
        loss.backward()
        optimizer.step()

        # display loss and predictions
        if batch_i%print_every == 0:        
            print('Loss: ', loss.item())
            plt.plot(time_steps[1:], x, 'r.') # input
            plt.plot(time_steps[1:], prediction.data.numpy().flatten(), 'b.') # predictions
            plt.show()
    
    return rnn

深度学总结：RNN训练需要注意地方：pytorch每一个batch训练之前需要把hidden = hidden.data，否者反向传播的梯度会遍历以前的timestep

pytorch每一个batch训练之前需要把hidden = hidden.data，否者反向传播的梯度会遍历以前的timestep

tensorflow也有把new_state更新，但是没有明显detach的操作，预计是tensorflow自己机制默认backpropagation一个timestep的梯度：

pytorch每一个batch训练之前需要把hidden = hidden.data，否者反向传播的梯度会遍历以前的timestep，它是自动求导，需要专门把那个state提出来一下，这样就相当于detach了，反向梯度到这里就停止了。