@songying 2019-01-16T14:16:31.000000Z 字数 4469 阅读 1985

dynet 白皮书： The Dynamic Neural Network Toolkit

dynet

Abstract

本文描述Dynet，一个基于动态图来实现神经网络模型的toolkit。在采用 static declaration strategy 的toolkits 如 Theano， CNTK以及 Tensorflow 中，用户首先定义一个计算图，然后将样本 feed into 到运行 computation and computes its derivatives 的引擎中。在Dynet 的 dynamic declaration strategy中，computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input。 Dynamic declaration 因此充分利用了复杂网络结构的实现，此外 Dynet 还提供了C++ 和 python的API接口。 dynamic declaration的一大挑战是： the symbolic computation graph is defined anew for every training example, its construction must have low overhead. 为了实现 dynamic declaration， Dynet 采用了一个 optimized C++ backend and lightweight graph representation。

Introduction

Static Declaration vs Dynamic Declaration

1. Static Declaration

TensorFlow, Theano, CNTK

采用 static declaration paradigm 的程序遵循以下两个过程：

Definition of a computational architecture: 用户编写一个用来定义他们期望执行的计算的“形状”。这些计算往往采用 computation graph来表示。 computation graph 往往是复杂计算的符号表示，这种符号表示是可以采用 autodiff算法来运行和分化。

Execution of the computation: 用户不断的填充样本并运行计算图。

Static declaration的优点：

after the computation graph is defined, it can be optimized in a number of ways so that the subsequent repeated executions of computation can be performed as quickly as possible, speeding training and test over large datasets

the static computation graph can be used to schedule computation across a pool of computational devices

the static declaration paradigm benefits the toolkit designer: less efficient algorithms for graph construction and optimization can be used since this one-time cost will be amortized across many training instances at run time.

Static declaration 的缺点：

Difficulty in expressing complex flow-control logic:

Complexity of the computation graph implementation:

Difficulty in debugging:

2. Dynamic Declaration

Dynamic Declaration 只需要一步：用户编程定义数据图。注意，这里没有definition 与 execution之分：在执行损失计算时，动态地创建必要的计算图，并为每个训练实例创建新的图。

这为用户带来以下优点：

define a different computation architecture for each training example or batch, allowing for the handling of variably sized or structured inputs using flow-control facilities of the host language

interleave definition and execution of computation, allowing for the handling of cases where the structure of computation may change depending on the results of previous computation steps

因此，这减轻了计算图实现的复杂性，因为它不需要 flow control operations 和支持 dynamically sized data。

TnesorFlow等选择静态图的原因之一是： creating and optimizing computation graphs can be expensive, and by spreading this cost across many training instances, the amortized cost of even an inefficient implementation will be negligible。

DyNet的主要目标是通过最小化图构造的计算成本、允许高效的动态计算以及 removing barriers to rapid prototyping and implementation of more sophisticated applications of neural nets that are not easy to implement in the static computation paradigm 来缩小这一差距。

为了实现该目标， Dynet的后端采用多种方式来优化以消除计算图创建的开销并支持在CPU和GPU上高速运行。

3. Coding Paradigm

1. Coding Paradigm Overview

Parameter and LookupParameter： Parameters 表示权重矩阵和bias 向量， LookupParameters 是我们打算look up的参数向量的集合，例如word embedding。Parameters 和 LookupParameters 在模型中存储。
Model：是Parameters 和LookupParameters的集合。用户可以通过请求model来获得Parameters。然后，model 会追踪参数（以及梯度）。
Trainer：实现了更新规则如SGD, AdaGrad 或 Adam。Trainer 持有一个指向model object 的指针，从而持有model object中的参数，并且还可以根据更新规则的要求维护关于参数的其他信息。
Expression：是在DyNet程序中操作的主要数据类型。每个Expression表示计算图中的一个子计算。
Operations：它们不是对象，而是操作expressions和返回expressions的函数，在后台构建计算图。
Builder Classes：定义创建各种标准化网络组件的接口，如RNN，tree-structured network, 和large-vocabulary softmax。它们在 expressions 和 operations 上工作，提供易于使用的库。
Computation Graph： Expressions 是 ComputationGraph 对象的一部分，该对象定义了需要执行的computation。Dynet 假定任意时刻只有一个computation graph。

在Dynet 中实现和训练一个模型的过程如下：

创建一个 Model

向model 中添加必要的Parameters 和 LookupParameters

创建一个Trainer对象并将其与Mode关联

对于每一个样本有：

创建一个ComputationGraph，通过构建表示期望computation 的Expression 来填充它

通过调用the final Expression 的函数value()或 npvalue()来计算图的前向计算结果

如果训练，计算一个Expression 表示loss function，使用它的 backward()函数来执行反向传播

采用Trainer 来更新Model中的参数

与TensorFlow 不同的是，图创建在每个样本中都会发生。

2. High-level Example

pass ，只是用python API 为例讲述了上述实现和训练一个模型的过程。

3. Two examples of Dynamic Graph Construction

本节描述两个例子： a dynamic network where the structure of the network changes for each training example 和 we perform dynamic flow control based on the results of computation.

Dynamic Network Shape：
Dynamic Flow Control：

4. Behind the Scenes

dynet 的一大特性就是它能够对每一个训练样本或minibatch 创建一个新的 computation graph。为了使得计算高效， Dynet 使用 careful memory management 来存储与前向计算和后向计算有关的values，因此，大多数时间都花在了 the actual computation中。

1. Computation Graphs

在后端系统中， Dynet 维护了一个 ComputationGraph ，它是一个由Node对象组成的有向无环图。

2. Effcient Graph Construction

3. Performing Computation

5. High Level Abstractions

dynet 在 Builders 中提供了一些高级别的函数如：RNN, tree-structured networks以及更加复杂的softmax 函数。

Efficiency Tools

Dynet 通过一系列特性来提高计算效率，包括 sparse updates, minibatching, and multi-processing across CPUs.

dynet 白皮书： The Dynamic Neural Network Toolkit

Abstract

Introduction

Static Declaration vs Dynamic Declaration

1. Static Declaration

2. Dynamic Declaration

3. Coding Paradigm

1. Coding Paradigm Overview

2. High-level Example

3. Two examples of Dynamic Graph Construction

4. Behind the Scenes

1. Computation Graphs

2. Effcient Graph Construction

3. Performing Computation

5. High Level Abstractions

Efficiency Tools

1. Sparse Updates

2. Minibatching

3. Parallel Processing

dynet 白皮书： The Dynamic Neural Network Toolkit

Abstract

Introduction

Static Declaration vs Dynamic Declaration

1. Static Declaration

2. Dynamic Declaration

3. Coding Paradigm

1. Coding Paradigm Overview

2. High-level Example

3. Two examples of Dynamic Graph Construction

4. Behind the Scenes

1. Computation Graphs

2. Effcient Graph Construction

3. Performing Computation

5. High Level Abstractions

Efficiency Tools

1. Sparse Updates

2. Minibatching

3. Parallel Processing

内容目录