[关闭]
@songying 2019-01-16T22:16:31.000000Z 字数 4469 阅读 1836

dynet 白皮书: The Dynamic Neural Network Toolkit

dynet


Abstract

本文描述Dynet, 一个基于动态图来实现神经网络模型的toolkit。在采用 static declaration strategy 的toolkits 如 Theano, CNTK以及 Tensorflow 中, 用户首先定义一个计算图,然后将样本 feed into 到运行 computation and computes its derivatives 的引擎中。在Dynet 的 dynamic declaration strategy中,computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input。 Dynamic declaration 因此充分利用了复杂网络结构的实现, 此外 Dynet 还提供了C++ 和 python的API接口。 dynamic declaration的一大挑战是: the symbolic computation graph is defined anew for every training example, its construction must have low overhead. 为了实现 dynamic declaration, Dynet 采用了一个 optimized C++ backend and lightweight graph representation。

Introduction

Static Declaration vs Dynamic Declaration

1. Static Declaration

TensorFlow, Theano, CNTK

采用 static declaration paradigm 的程序遵循以下两个过程:

  • Definition of a computational architecture: 用户编写一个用来定义他们期望执行的计算的“形状”。 这些计算往往采用 computation graph来表示。 computation graph 往往是复杂计算的符号表示,这种符号表示是可以采用 autodiff算法来运行和分化。
  • Execution of the computation: 用户不断的填充样本并运行计算图。

Static declaration的优点:

  • after the computation graph is defined, it can be optimized in a number of ways so that the subsequent repeated executions of computation can be performed as quickly as possible, speeding training and test over large datasets
  • the static computation graph can be used to schedule computation across a pool of computational devices
  • the static declaration paradigm benefits the toolkit designer: less efficient algorithms for graph construction and optimization can be used since this one-time cost will be amortized across many training instances at run time.

Static declaration 的缺点:

  • Difficulty in expressing complex flow-control logic:
  • Complexity of the computation graph implementation:
  • Difficulty in debugging:

2. Dynamic Declaration

Dynamic Declaration 只需要一步: 用户编程定义数据图。 注意,这里没有definition 与 execution之分: 在执行损失计算时,动态地创建必要的计算图,并为每个训练实例创建新的图。

这为用户带来以下优点:

  • define a different computation architecture for each training example or batch, allowing for the handling of variably sized or structured inputs using flow-control facilities of the host language
  • interleave definition and execution of computation, allowing for the handling of cases where the structure of computation may change depending on the results of previous computation steps

因此,这减轻了计算图实现的复杂性,因为它不需要 flow control operations 和支持 dynamically sized data。

TnesorFlow等选择静态图的原因之一是: creating and optimizing computation graphs can be expensive, and by spreading this cost across many training instances, the amortized cost of even an inefficient implementation will be negligible。

DyNet的主要目标是通过最小化图构造的计算成本、允许高效的动态计算以及 removing barriers to rapid prototyping and implementation of more sophisticated applications of neural nets that are not easy to implement in the static computation paradigm 来缩小这一差距。

为了实现该目标, Dynet的后端采用多种方式来优化以消除计算图创建的开销并支持在CPU和GPU上高速运行。

3. Coding Paradigm

1. Coding Paradigm Overview

在Dynet 中实现和训练一个模型的过程如下:


  • 创建一个 Model
  • 向model 中添加必要的Parameters 和 LookupParameters
  • 创建一个Trainer对象并将其与Mode关联
  • 对于每一个样本有:

    • 创建一个ComputationGraph, 通过构建表示期望computation 的Expression 来填充它
    • 通过调用the final Expression 的函数value()或 npvalue()来计算图的前向计算结果
    • 如果训练, 计算一个Expression 表示loss function, 使用它的 backward()函数来执行反向传播
    • 采用Trainer 来更新Model中的参数

与TensorFlow 不同的是, 图创建在每个样本中都会发生。

2. High-level Example

pass , 只是用python API 为例讲述了上述实现和训练一个模型的过程。

3. Two examples of Dynamic Graph Construction

本节描述两个例子: a dynamic network where the structure of the network changes for each training example 和 we perform dynamic flow control based on the results of computation.

4. Behind the Scenes

dynet 的一大特性就是它能够对每一个训练样本或minibatch 创建一个新的 computation graph。 为了使得计算高效, Dynet 使用 careful memory management 来存储与前向计算和后向计算有关的values,因此,大多数时间都花在了 the actual computation中。

1. Computation Graphs

在后端系统中, Dynet 维护了一个 ComputationGraph , 它是一个由Node对象组成的有向无环图。

2. Effcient Graph Construction

3. Performing Computation

5. High Level Abstractions

dynet 在 Builders 中提供了一些高级别的函数如:RNN, tree-structured networks以及更加复杂的softmax 函数。

Efficiency Tools

Dynet 通过一系列特性来提高计算效率, 包括 sparse updates, minibatching, and multi-processing across CPUs.

1. Sparse Updates

2. Minibatching

3. Parallel Processing

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注