@Team 2019-06-08T14:04:36.000000Z 字数 21809 阅读 2478

TensorFlow 2.0简明指南


TensorFlow虽是深度学习领域最广泛使用的框架,但是对比PyTorch这一动态图框架,采用静态图(Graph模式)的TensorFlow确实是难用。好在最近TensorFlow支持了eager模式,对标PyTorch的动态执行机制。更进一步地,Google在最近推出了全新的版本TensorFlow 2.0,2.0版本相比1.0版本不是简单地更新,而是一次重大升级(虽然目前只发布了preview版本)。简单地来说,TensorFlow 2.0默认采用eager执行模式,而且重整了很多混乱的模块。毫无疑问,2.0版本将会逐渐替换1.0版本,所以很有必要趁早入手TensorFlow 2.0。这篇文章将简明扼要地介绍TensorFlow 2.0,以求快速入门。


TensorFlow的Eager执行时一种命令式编程(imperative programming),这和原生Python是一致的,当你执行某个操作时是立即返回结果的。而TensorFlow一直是采用Graph模式,即先构建一个计算图,然后需要开启Session,喂进实际的数据才真正执行得到结果。显然,eager执行更简洁,我们可以更容易debug自己的代码,这也是为什么PyTorch更简单好用的原因。一个简单的例子如下:

  1. x = tf.ones((2, 2), dtype=tf.dtypes.float32)
  2. y = tf.constant([[1, 2],
  3. [3, 4]], dtype=tf.dtypes.float32)
  4. z = tf.matmul(x, y)
  5. print(z)
  6. # tf.Tensor(
  7. # [[4. 6.]
  8. # [4. 6.]], shape=(2, 2), dtype=float32)
  9. print(z.numpy())
  10. # [[4. 6.]
  11. # [4. 6.]]



  1. random_value = tf.random.uniform([], 0, 1)
  2. x = tf.reshape(tf.range(0, 4), [2, 2])
  3. print(random_value)
  4. if random_value.numpy() > 0.5:
  5. y = tf.matmul(x, x)
  6. else:
  7. y = tf.add(x, x)



  1. w = tf.Variable([[1.0]])
  2. with tf.GradientTape() as tape:
  3. loss = w * w + 2. * w + 5.
  4. grad = tape.gradient(loss, w)
  5. print(grad) # => tf.Tensor([[ 4.]], shape=(1, 1), dtype=float32)


TensorFlow 2.0引入的eager提高了代码的简洁性,而且更容易debug。但是对于性能来说,eager执行相比Graph模式会有一定的损失。这不难理解,毕竟原生的Graph模式是先构建好静态图,然后才真正执行。这对于 在分布式训练、性能优化和生产部署方面具有优势。但是好在,TensorFlow 2.0引入了tf.function和AutoGraph来缩小eager执行和Graph模式的性能差距,其核心是将一系列的Python语法转化为高性能的graph操作。


AutoGraph在TensorFlow 1.x已经推出,主要是可以将一些常用的Python代码转化为TensorFlow支持的Graph代码。一个典型的例子是在TensorFlow中我们必须使用tf.while和tf.cond等复杂的算子来实现动态流程控制,但是现在我们可以使用Python原生的for和if等语法写代码,然后采用AutoGraph转化为TensorFlow所支持的代码,如下面的例子:

  1. def square_if_positive(x):
  2. if x > 0:
  3. x = x * x
  4. else:
  5. x = 0.0
  6. return x
  7. # eager 模式
  8. print('Eager results: %2.2f, %2.2f' % (square_if_positive(tf.constant(9.0)),
  9. square_if_positive(tf.constant(-9.0))))
  10. # graph 模式
  11. tf_square_if_positive = tf.autograph.to_graph(square_if_positive)
  12. with tf.Graph().as_default():
  13. # The result works like a regular op: takes tensors in, returns tensors.
  14. # You can inspect the graph using tf.get_default_graph().as_graph_def()
  15. g_out1 = tf_square_if_positive(tf.constant( 9.0))
  16. g_out2 = tf_square_if_positive(tf.constant(-9.0))
  17. with tf.compat.v1.Session() as sess:
  18. print('Graph results: %2.2f, %2.2f\n' % (sess.run(g_out1), sess.run(g_out2)))

上面我们定义了一个square_if_positive函数,它内部使用的Python的原生的if语法,对于TensorFlow 2.0的eager执行,这是没有问题的。然而这是TensorFlow 1.x所不支持的,但是使用AutoGraph可以将这个函数转为Graph函数,你可以将其看成一个常规TensorFlow op,其可以在Graph模式下运行(tf2 没有Session,这是tf1.x的特性,想使用tf1.x的话需要调用tf.compat.v1)。大家要注意eager模式和Graph模式的差异,尽管结果是一样的,但是Graph模式更高效。

  1. print(tf.autograph.to_code(square_if_positive))
  2. #################################################
  3. from __future__ import print_function
  4. def tf__square_if_positive(x):
  5. try:
  6. with ag__.function_scope('square_if_positive'):
  7. do_return = False
  8. retval_ = None
  9. cond = ag__.gt(x, 0)
  10. def if_true():
  11. with ag__.function_scope('if_true'):
  12. x_1, = x,
  13. x_1 = x_1 * x_1
  14. return x_1
  15. def if_false():
  16. with ag__.function_scope('if_false'):
  17. x = 0.0
  18. return x
  19. x = ag__.if_stmt(cond, if_true, if_false)
  20. do_return = True
  21. retval_ = x
  22. return retval_
  23. except:
  24. ag__.rewrite_graph_construction_error(ag_source_map__)
  25. tf__square_if_positive.autograph_info__ = {}

可以看到AutoGraph转化的代码定义了两个条件函数,然后调用if_stmt op,应该就是类似tf.cond的op。

  1. def sum_even(items):
  2. s = 0
  3. for c in items:
  4. if c % 2 > 0:
  5. continue
  6. s += c
  7. return s
  8. print('Eager result: %d' % sum_even(tf.constant([10,12,15,20])))
  9. tf_sum_even = tf.autograph.to_graph(sum_even)
  10. with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
  11. print('Graph result: %d\n\n' % sess.run(tf_sum_even(tf.constant([10,12,15,20]))))

对于大部分Python特性AutoGraph是支持的,但是其仍然有限制,具体可以见Capabilities and Limitations


  1. x = tf.constant([10, 12, 15, 20])
  2. print("Eager at orginal code:", timeit.timeit(lambda: sum_even(x), number=100))
  3. print("Eager at autograph code:", timeit.timeit(lambda: tf_sum_even(x), number=100))
  4. with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
  5. graph_op = tf_sum_even(tf.constant([10, 12, 15, 20]))
  6. sess.run(graph_op) # remove first call
  7. print("Graph at autograph code:", timeit.timeit(lambda: sess.run(graph_op), number=100))
  8. ##########################################
  9. Eager at orginal code: 0.05176109499999981
  10. Eager at autograph code: 0.11203173799999977
  11. Graph at autograph code: 0.03418808900000059


所以,在TensorFlow 2.0,我们一般不会直接使用tf.autograph,因为eager执行下效率没有提升。要真正达到Graph模式下的效率,要依赖tf.function这个更强大的利器。


尽管eager执行更简洁,但是Graph模式却是性能更高,为了减少这个性能gap,TensorFlow 2.0引入了tf.function,先给出官方对tf.function的说明:

function constructs a callable that executes a TensorFlow graph (tf.Graph) created by tracing the TensorFlow operations in func. This allows the TensorFlow runtime to apply optimizations and exploit parallelism in the computation defined by func.


  1. def f(x, y):
  2. print(x, y)
  3. return tf.reduce_mean(tf.multiply(x ** 2, 3) + y)
  4. g = tf.function(f)
  5. x = tf.constant([[2.0, 3.0]])
  6. y = tf.constant([[3.0, -2.0]])
  7. # `f` and `g` will return the same value, but `g` will be executed as a
  8. # TensorFlow graph.
  9. assert f(x, y).numpy() == g(x, y).numpy()
  10. # tf.Tensor([[2. 3.]], shape=(1, 2), dtype=float32) tf.Tensor([[ 3. -2.]], shape=(1, 2), dtype=float32)
  11. # Tensor("x:0", shape=(1, 2), dtype=float32) Tensor("y:0", shape=(1, 2), dtype=float32)

如上面的例子,被tf.function装饰的函数将以Graph模式执行,可以把它想象一个封装了Graph的TF op,直接调用它也会立即得到Tensor结果,但是其内部是高效执行的。我们在内部打印Tensor时,eager执行会直接打印Tensor的值,而Graph模式打印的是Tensor句柄,其无法调用numpy方法取出值,这和TF 1.x的Graph模式是一致的。

  1. import timeit
  2. conv_layer = tf.keras.layers.Conv2D(100, 3)
  3. @tf.function
  4. def conv_fn(image):
  5. return conv_layer(image)
  6. image = tf.zeros([1, 200, 200, 100])
  7. # warm up
  8. conv_layer(image); conv_fn(image)
  9. print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
  10. print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
  11. # 单纯的卷积差距不是很大
  12. # Eager conv: 0.44013839924952197
  13. # Function conv: 0.3700763391782858
  14. lstm_cell = tf.keras.layers.LSTMCell(10)
  15. @tf.function
  16. def lstm_fn(input, state):
  17. return lstm_cell(input, state)
  18. input = tf.zeros([10, 10])
  19. state = [tf.zeros([10, 10])] * 2
  20. # warm up
  21. lstm_cell(input, state); lstm_fn(input, state)
  22. print("eager lstm:", timeit.timeit(lambda: lstm_cell(input, state), number=10))
  23. print("function lstm:", timeit.timeit(lambda: lstm_fn(input, state), number=10))
  24. # 对于LSTM比较heavy的计算,Graph执行要快很多
  25. # eager lstm: 0.025562446062237565
  26. # function lstm: 0.0035498656569271647

要想灵活使用tf.function,必须深入理解它背后的机理,这里简单地谈一下。在TF 1.x时,首先要创建静态计算图,然后新建Session真正执行不同的运算:

  1. import tensorflow as tf
  2. x = tf.placeholder(tf.float32)
  3. y = tf.square(x)
  4. z = tf.add(x, y)
  5. sess = tf.Session()
  6. z0 = sess.run([z], feed_dict={x: 2.}) # 6.0
  7. z1 = sess.run([z], feed_dict={x: 2., y: 2.}) # 4.0


  1. def compute_z0(x):
  2. return tf.add(x, tf.square(x))
  3. def compute_z1(x, y):
  4. return tf.add(x, y)


  1. import tensorflow as tf
  2. @tf.function
  3. def compute_z1(x, y):
  4. return tf.add(x, y)
  5. @tf.function
  6. def compute_z0(x):
  7. return compute_z1(x, tf.square(x))
  8. z0 = compute_z0(2.)
  9. z1 = compute_z1(2., 2.)


  1. # Functions are polymorphic
  2. @tf.function
  3. def double(a):
  4. print("Tracing with", a)
  5. return a + a
  6. print(double(tf.constant(1)))
  7. print(double(tf.constant(1.1)))
  8. print(double(tf.constant([1, 2])))
  9. # Tracing with Tensor("a:0", shape=(), dtype=int32)
  10. # tf.Tensor(2, shape=(), dtype=int32)
  11. # Tracing with Tensor("a:0", shape=(), dtype=float32)
  12. # tf.Tensor(2.2, shape=(), dtype=float32)
  13. # Tracing with Tensor("a:0", shape=(2,), dtype=int32)
  14. # tf.Tensor([2 4], shape=(2,), dtype=int32)



  1. @tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
  2. def f(x):
  3. return tf.add(x, 1.)
  4. print(f(tf.constant(1.0))) # tf.Tensor(2.0, shape=(), dtype=float32)
  5. print(f(tf.constant([1.0,]))) # tf.Tensor([2.], shape=(1,), dtype=float32)
  6. print(f(tf.constant([1]))) # ValueError: Python inputs incompatible with input_signature



  1. def sum_even(items):
  2. s = 0
  3. for c in items:
  4. if c % 2 > 0:
  5. continue
  6. s += c
  7. return s
  8. sum_even_autograph_on = tf.function(sum_even, autograph=True)
  9. sum_even_autograph_off = tf.function(sum_even, autograph=False)
  10. x = tf.constant([10, 12, 15, 20])
  11. sum_even(x) # OK
  12. sum_even_autograph_on(x) # OK
  13. sum_even_autograph_off(x) # TypeError: Tensor objects are only iterable when eager execution is enabled



  1. class ScalarModel(object):
  2. def __init__(self):
  3. self.v = tf.Variable(0)
  4. @tf.function
  5. def increment(self, amount):
  6. self.v.assign_add(amount)
  7. model1 = ScalarModel()
  8. model1.increment(tf.constant(3))
  9. assert int(model1.v) == 3
  10. model1.increment(tf.constant(4))
  11. assert int(model1.v) == 7
  12. model2 = ScalarModel() # model1和model2 拥有不同变量
  13. model2.increment(tf.constant(5))
  14. assert int(model2.v) == 5



  1. @tf.function
  2. def print_element(items):
  3. for c in items:
  4. tf.print(c)
  5. x = tf.constant([1, 5, 6, 8, 3])
  6. print_element(x)

这里就对tf.function做这些介绍,但是实际上其还有更多复杂的使用须知,详情可以参考TensorFlow 2.0: Functions, not Sessions


TensorFlow 2.0全面keras化:如果你想使用高级的layers,只能选择keras。TensorFlow 1.x存在tf.layers以及tf.contrib.slim等高级API来创建模型,但是2.0仅仅支持tf.keras.layers,不管怎么样,省的大家重复造轮子,也意味着模型构建的部分大家都是统一的,增加代码的复用性(回忆一下原来的TensorFlow模型构建真是千奇百怪)。值得注意的tf.nn模块依然存在,里面是各种常用的nn算子,不过大部分人不会去直接用这些算子构建模型,因为keras.layers基本上包含了常用的网络层。当然,如果想构建新的layer,可以直接继承tf.keras.layers.Layer:

  1. class Linear(tf.keras.layers.Layer):
  2. def __init__(self, units=32, **kwargs):
  3. super(Linear, self).__init__(**kwargs)
  4. self.units = units
  5. def build(self, input_shape):
  6. self.w = self.add_weight(shape=(input_shape[-1], self.units),
  7. initializer='random_normal',
  8. trainable=True)
  9. self.b = self.add_weight(shape=(self.units,),
  10. initializer='random_normal',
  11. trainable=True)
  12. def call(self, inputs):
  13. return tf.matmul(inputs, self.w) + self.b
  14. layer = Linear(32)
  15. print(layer.weights) # [] the weights have not created
  16. x = tf.ones((8, 16))
  17. y = layer(x) # shape [8, 32]
  18. print(layer.weights)


Layer类是keras中最基本的类,对其有个全面的认识比较重要,具体可以看源码。大部分情况下,我们只会复用keras已有的layers,而我们创建模型最常用的是keras.Model类,这个Model类是继承了Layer类,但是提供了更多的API,如model.compile(), model.fit(), model.evaluate(), model.predict()等,熟悉keras的都知道这是用于模型训练,评估和预测的方法。另外重要的一点,我们可以继承Model类,创建包含多layers的模块或者模型:

  1. class ConvBlock(tf.keras.Model):
  2. """Convolutional Block consisting of (conv->bn->relu).
  3. Arguments:
  4. num_filters: number of filters passed to a convolutional layer.
  5. kernel_size: the size of convolution kernel
  6. weight_decay: weight decay
  7. dropout_rate: dropout rate.
  8. """
  9. def __init__(self, num_filters, kernel_size,
  10. weight_decay=1e-4, dropout_rate=0.):
  11. super(ConvBlock, self).__init__()
  12. self.conv = tf.keras.layers.Conv2D(num_filters,
  13. kernel_size,
  14. padding="same",
  15. use_bias=False,
  16. kernel_initializer="he_normal",
  17. kernel_regularizer=tf.keras.regularizers.l2(weight_decay))
  18. self.bn = tf.keras.layers.BatchNormalization()
  19. self.dropout = tf.keras.layers.Dropout(dropout_rate)
  20. def call(self, x, training=True):
  21. output = self.conv(x)
  22. output = self.bn(x, training=training)
  23. output = tf.nn.relu(output)
  24. output = self.dropout(output, training=training)
  25. return output
  26. model = ConvBlock(32, 3, 1e-4, 0.5)
  27. x = tf.ones((4, 224, 224, 3))
  28. y = model(x)
  29. print(model.layers)


  1. class SimpleCNN(tf.keras.Model):
  2. def __init__(self, num_classes):
  3. super(SimpleCNN, self).__init__()
  4. self.block1 = ConvBlock(16, 3)
  5. self.block2 = ConvBlock(32, 3)
  6. self.block3 = ConvBlock(64, 3)
  7. self.global_pool = tf.keras.layers.GlobalAveragePooling2D()
  8. self.classifier = tf.keras.layers.Dense(num_classes)
  9. def call(self, x, training=True):
  10. output = self.block1(x, training=training)
  11. output = self.block2(output, training=training)
  12. output = self.block3(output, training=training)
  13. output = self.global_pool(output)
  14. logits = self.classifier(output)
  15. return logits
  16. model = SimpleCNN(10)
  17. print(model.layers)
  18. x = tf.ones((4, 32, 32, 3))
  19. y = model(x) # [4, 10]



  1. model = tf.keras.Sequential([
  2. # Adds a densely-connected layer with 64 units to the model:
  3. layers.Dense(64, activation='relu', input_shape=(32,)),
  4. # Add another:
  5. layers.Dense(64, activation='relu'),
  6. # Add a softmax layer with 10 output units:
  7. layers.Dense(10, activation='softmax')])

或者采用keras的functional API:

  1. inputs = keras.Input(shape=(784,), name='img')
  2. x = layers.Dense(64, activation='relu')(inputs)
  3. x = layers.Dense(64, activation='relu')(x)
  4. outputs = layers.Dense(10, activation='softmax')(x)
  5. model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')



在开始模型训练之前,一个重要的项是数据加载,TensorFlow 2.0的数据加载还是采用tf.data,不过在eager模式下,tf.data.Dataset这个类将成为一个Python迭代器,我们可以直接取值:

  1. dataset = tf.data.Dataset.range(10)
  2. for i, elem in enumerate(dataset):
  3. print(elem) # prints 0, 1, ..., 9

这里我们只是展示了一个简单的例子,但是足以说明tf.data在TensorFlow 2.0下的变化,tf.data其它使用技巧和TensorFlow 1.x是一致的。


  1. bce = tf.keras.losses.BinaryCrossentropy()
  2. loss = bce([0., 0., 1., 1.], [1., 1., 1., 0.])
  3. print('Loss: ', loss.numpy()) # Loss: 11.522857

而metrics模块主要包含了常用的模型评估指标,这个模块与TensorFlow 1.x的metrics模块设计理念是一致的,就是metric本身是有状态的,一般是通过创建Variable来记录。基本用法如下:

  1. m = tf.keras.metrics.Accuracy()
  2. m.update_state([1, 2, 3, 4], [0, 2, 3, 4])
  3. print('result: ', m.result().numpy()) # result: 0.75
  4. m.update_state([0, 2, 3], [1, 2, 3])
  5. print('result: ', m.result().numpy()) # result: 0.714
  6. m.reset_states() # 重置
  7. m.update_state([0, 2, 3], [1, 2, 3])
  8. print('result: ', m.result().numpy()) # result: 0.667


  1. class CatgoricalTruePositives(tf.keras.metrics.Metric):
  2. def __init__(self, name='categorical_true_positives', **kwargs):
  3. super(CatgoricalTruePositives, self).__init__(name=name, **kwargs)
  4. self.true_positives = self.add_weight(name='tp', initializer='zeros')
  5. def update_state(self, y_true, y_pred, sample_weight=None):
  6. y_pred = tf.argmax(y_pred)
  7. values = tf.equal(tf.cast(y_true, 'int32'), tf.cast(y_pred, 'int32'))
  8. values = tf.cast(values, 'float32')
  9. if sample_weight is not None:
  10. sample_weight = tf.cast(sample_weight, 'float32')
  11. values = tf.multiply(values, sample_weight)
  12. self.true_positives.assign_add(tf.reduce_sum(values))
  13. def result(self):
  14. return self.true_positives
  15. def reset_states(self):
  16. # The state of the metric will be reset at the start of each epoch.
  17. self.true_positives.assign(0.)


  1. import numpy as np
  2. import tensorflow as tf
  3. fashion_mnist = tf.keras.datasets.fashion_mnist
  4. (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
  5. # Adding a dimension to the array -> new shape == (28, 28, 1)
  6. train_images = train_images[..., None]
  7. test_images = test_images[..., None]
  8. # Getting the images in [0, 1] range.
  9. train_images = train_images / np.float32(255)
  10. test_images = test_images / np.float32(255)
  11. train_labels = train_labels.astype('int64')
  12. test_labels = test_labels.astype('int64')
  13. # dataset
  14. train_ds = tf.data.Dataset.from_tensor_slices(
  15. (train_images, train_labels)).shuffle(10000).batch(32)
  16. test_ds = tf.data.Dataset.from_tensor_slices(
  17. (test_images, test_labels)).batch(32)
  18. # Model
  19. class MyModel(tf.keras.Sequential):
  20. def __init__(self):
  21. super(MyModel, self).__init__([
  22. tf.keras.layers.Conv2D(32, 3, activation='relu'),
  23. tf.keras.layers.MaxPooling2D(),
  24. tf.keras.layers.Conv2D(64, 3, activation='relu'),
  25. tf.keras.layers.MaxPooling2D(),
  26. tf.keras.layers.Flatten(),
  27. tf.keras.layers.Dense(64, activation='relu'),
  28. tf.keras.layers.Dense(10, activation=None)
  29. ])
  30. model = MyModel()
  31. # optimizer
  32. initial_learning_rate = 1e-4
  33. lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
  34. initial_learning_rate,
  35. decay_steps=100000,
  36. decay_rate=0.96,
  37. staircase=True)
  38. optimizer = tf.keras.optimizers.RMSprop(learning_rate=lr_schedule)
  39. # checkpoint
  40. checkpoint = tf.train.Checkpoint(step=tf.Variable(0), optimizer=optimizer, model=model)
  41. manager = tf.train.CheckpointManager(checkpoint, './tf_ckpts', max_to_keep=3)
  42. # loss function
  43. loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
  44. # metric
  45. train_loss_metric = tf.keras.metrics.Mean(name='train_loss')
  46. train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
  47. test_loss_metric = tf.keras.metrics.Mean(name='test_loss')
  48. test_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')
  49. # define a train step
  50. @tf.function
  51. def train_step(inputs, targets):
  52. with tf.GradientTape() as tape:
  53. predictions = model(inputs, training=True)
  54. loss = loss_object(targets, predictions)
  55. loss += sum(model.losses) # add other losses
  56. # compute gradients and update variables
  57. gradients = tape.gradient(loss, model.trainable_variables)
  58. optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  59. train_loss_metric(loss)
  60. train_acc_metric(targets, predictions)
  61. # define a test step
  62. @tf.function
  63. def test_step(inputs, targets):
  64. predictions = model(inputs, training=False)
  65. loss = loss_object(targets, predictions)
  66. test_loss_metric(loss)
  67. test_acc_metric(targets, predictions)
  68. # train loop
  69. epochs = 10
  70. for epoch in range(epochs):
  71. print('Start of epoch %d' % (epoch,))
  72. # Iterate over the batches of the dataset
  73. for step, (inputs, targets) in enumerate(train_ds):
  74. train_step(inputs, targets)
  75. checkpoint.step.assign_add(1)
  76. # log every 20 step
  77. if step % 20 == 0:
  78. manager.save() # save checkpoint
  79. print('Epoch: {}, Step: {}, Train Loss: {}, Train Accuracy: {}'.format(
  80. epoch, step, train_loss_metric.result().numpy(),
  81. train_acc_metric.result().numpy())
  82. )
  83. train_loss_metric.reset_states()
  84. train_acc_metric.reset_states()
  85. # do test
  86. for inputs, targets in test_ds:
  87. test_step(inputs, targets)
  88. print('Test Loss: {}, Test Accuracy: {}'.format(
  89. test_loss_metric.result().numpy(),
  90. test_acc_metric.result().numpy()))

麻雀虽小,但五脏俱全,这个实例包括数据加载,模型创建,以及模型训练和测试。特别注意的是,这里我们将train和test的一个step通过tf.function转为Graph模式,可以加快训练速度,这是一种值得推荐的方式。另外一点,上面的训练方式采用的是custom training loops,自由度较高,另外一种训练方式是采用keras比较常规的compile和fit训练方式。

TensorFlow 2.0的另外一个特点是提供tf.distribute.Strategy更好地支持分布式训练,其接口更加简单易用。我们最常用的分布式策略是单机多卡同步训练,tf.distribute.MirroredStrategy完美支持这种策略。这种策略将在每个GPU设备上创建一个模型副本(replica),模型中的参数在所有replica之间映射,称之为MirroredVariables,当他们执行相同更新时将在所有设备间同步。底层的通信采用all-reduce算法,all-reduce方法可以将多个设备上的Tensors聚合在每个设备上,这种通信方式比较高效,而all-reduce算法有多中实现方式,这里默认采用NVIDIA NCCL的all-reduce方法。创建这种策略只需要简单地定义:

  1. mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"],
  2. cross_device_ops=tf.distribute.NcclAllReduce())
  3. # 这里将在GPU 0和1上同步训练


  1. with mirrored_strategy.scope():
  2. model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
  3. optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)


  1. with mirrored_strategy.scope():
  2. dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(1000).batch(
  3. global_batch_size)
  4. # 注意这里是全局batch size
  5. dist_dataset = mirrored_strategy.experimental_distribute_dataset(dataset)

然后我们定义train step,并采用strategy.experimental_run_v2来执行:

  1. @tf.function
  2. def train_step(dist_inputs):
  3. def step_fn(inputs):
  4. features, labels = inputs
  5. with tf.GradientTape() as tape:
  6. logits = model(features)
  7. cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
  8. logits=logits, labels=labels)
  9. loss = tf.reduce_sum(cross_entropy) * (1.0 / global_batch_size)
  10. grads = tape.gradient(loss, model.trainable_variables)
  11. optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))
  12. return cross_entropy
  13. per_example_losses = mirrored_strategy.experimental_run_v2(step_fn, args=(dist_inputs,))
  14. mean_loss = mirrored_strategy.reduce(tf.distribute.ReduceOp.MEAN,
  15. per_example_losses, axis=0)
  16. return mean_loss

这里要注意的是我们要将loss除以全部batch size,只是因为分布式训练时在更新梯度前会将所有replica上梯度通过all-reduce算法相加聚合到每个设备上。另外,strategy.experimental_run_v2返回是每个replica的结果,要得到最终结果,需要reduce聚合一下。

  1. with mirrored_strategy.scope():
  2. for inputs in dist_dataset:
  3. print(train_step(inputs))

要注意的是MirroredStrategy只支持单机多卡同步训练,如果想使用多机版本,需要采用MultiWorkerMirorredStrateg。其它的分布式训练策略还有CentralStorageStrategy,TPUStrategy,ParameterServerStrategy。想深入了解的话,可以查看distribute_strategy guide以及distribute_strategy tuorial


这里我们简明扼要地介绍了TensorFlow 2.0的核心新特性,相信掌握这些新特性就可以快速入手TensorFlow 2.0。不过目前Google只发布了TensorFlow 2.0.0-beta0版本,未来也许会有更多想象不到的黑科技。加油!TensorFlow Coders。


  1. TensorFlow官网.
  2. TensorFlow 2.0 docs.