梯度 | DY的个人博客

在反向传播过程中，神经网络需要对每一个loss对应的学习参数求偏导，算出的这个值叫做梯度，用来乘以学习率然后更新学习参数使用的

求单变量偏导

它是通过tf.gradients函数来实现的。
tf.gradients(ys,xs,grad_ys=None,name=’gradients’,colocate_gradients_with_ops=False,gate_gradients=False, aggregation_method=None,stop_gradients=None)

第一个参数为求导公式的结果，第二个参数为指定公式中的哪个变量来求偏导。实现第一个参数对第二个参数求导。

import tensorflow as tf
w1=tf.Variable([[1,2]],dtype=tf.float32)
w2=tf.Variable([[3,4]],dtype=tf.float32)
y=tf.matmul(w1,tf.convert_to_tensor([[9],[10]],dtype=tf.float32))
grads=tf.gradients(y,[w1])
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print("梯度为：",sess.run(grads))

 #梯度为： [array([[ 9., 10.]], dtype=float32)]

上面例子中，由于y是由w1与[[9],[10]]相乘而来，所以导数就是[[9],[10]]，也就是斜率

求多变量偏导

这就需要用到tf.gradients的第三个参数，grad_ys。grad_ys也是一个list，其长度等于len(ys)。这个参数的意义在于对第一个参数中的每个元素的求导加权重

import tensorflow as tf

tf.reset_default_graph()
#随机生成一个形状为2的变量
w1 = tf.get_variable('w1', shape=[2])
w2 = tf.get_variable('w2', shape=[2])

w3 = tf.get_variable('w3', shape=[2])
w4 = tf.get_variable('w4', shape=[2])

y1 = w1 + w2+ w3
y2 = w3 + w4
#不考虑参数grad_ys
gradients= tf.gradients([y1, y2], [w1, w2, w3, w4])
#考虑参数grad_ys
gradients1 = tf.gradients([y1, y2], [w1, w2, w3, w4], grad_ys=[tf.convert_to_tensor([1.,2.]),
                                                          tf.convert_to_tensor([3.,4.])])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(w1))
    print(sess.run(gradients1))
    print(sess.run(gradients))

梯度停止

对于反向传播过程中某种特殊情况需要停止梯度运算时，在tensorflow中提供了一个tf.stop_gradient函数，被它定义过的节点将没有梯度运算功能

import tensorflow as tf
tf.reset_default_graph()
w1 = tf.get_variable('w1', shape=[2])
w2 = tf.get_variable('w2', shape=[2])

w3 = tf.get_variable('w3', shape=[2])
w4 = tf.get_variable('w4', shape=[2])

y1 = w1 + w2+ w3
y2 = w3 + w4

a = w1+w2
a_stoped = tf.stop_gradient(a)
y3= a_stoped+w3

gradients = tf.gradients([y1, y2], [w1, w2, w3, w4], grad_ys=[tf.convert_to_tensor([1.,2.]),
                                                          tf.convert_to_tensor([3.,4.])])                                                      
gradients2 = tf.gradients(y3, [w1, w2, w3], grad_ys=tf.convert_to_tensor([1.,2.]))                                                          
print(gradients2) 
 
gradients3 = tf.gradients(y3, [w3], grad_ys=tf.convert_to_tensor([1.,2.])) 
                                                       
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(gradients))
    #print(sess.run(gradients2))#报错，因为w1和w2梯度停止了
    print(sess.run(gradients3))