@atry
2017-11-16T10:04:53.000000Z
字数 8053
阅读 1309
神经网络与函数式编程
在本系列的前几篇文章中,我们学习了深度学习和函数式编程的对应关系,以及如何用DeepLearning.scala创建函数式风格的神经网络。你可能会好奇,DeepLearning.scala是如何提供这些能力的。在接下来几篇文章中,我将揭示DeepLearning.scala如何实现以下这些功能的内部细节:
在本篇文章中,我们将首先谈谈多态函数。
DeepLearning.scala中内置了矩阵乘法的函数dot
。dot
接受两个多维数组INDArray
作为参数,返回一个多维数组的计算图,比如可以这样用:
val ndArray1: INDArray = ???
val ndArray2: INDArray = ???
val ndArrayLayer: INDArrayLayer = dot(ndArray1, ndArray2)
如果用这个dot
函数实现全连接层的话,两个参数中有一个会是权重INDArrayWeight
,比如:
val x: INDArray = ???
val w: INDArrayWeight = ???
val y: INDArrayLayer = dot(x, w)
此外,通常神经网络有多层,除了第一层以外,其他层的输入都是上一层的输出,那么这种情况下,dot
的两个参数中还会有一个是其他层输出的计算图INDArrayLayer
,比如:
val x1: INDArray = ???
val w1: INDArrayWeight = ???
val x2: INDArrayLayer = dot(x1, w1)
val w2: INDArrayWeight = ???
val y: INDArrayLayer = dot(x2, w2)
结果就是,我们需要定义一个dot
函数,能支持以上所有用法,就必须能支持各种不同的参数类型。
理想情况下,两个参数都应该支持INDArray
、INDArrayLayer
、INDArrayWeight
三个类型,排列起来有九种签名:
def dot(operand0: INDArray, operand0: INDArray): INDArrayLayer
def dot(operand0: INDArrayLayer, operand0: INDArray): INDArrayLayer
def dot(operand0: INDArrayWeight, operand0: INDArray): INDArrayLayer
def dot(operand0: INDArray, operand0: INDArrayLayer): INDArrayLayer
def dot(operand0: INDArrayLayer, operand0: INDArrayLayer): INDArrayLayer
def dot(operand0: INDArrayWeight, operand0: INDArrayLayer): INDArrayLayer
def dot(operand0: INDArray, operand0: INDArrayWeight): INDArrayLayer
def dot(operand0: INDArrayLayer, operand0: INDArrayWeight): INDArrayLayer
def dot(operand0: INDArrayWeight, operand0: INDArrayWeight): INDArrayLayer
如果要重载这么多函数的话,就太过冗余了。
DeepLearning
类型类我们的做法是定义一个
DeepLearning
Aux模式的依赖类型的类型类,其中利用simulacrum生成繁琐的boilerplate代码:
@simulacrum.typeclass
trait DeepLearning[Differentiable] {
type Data
type Delta
def forward(differentiable: Differentiable): Do[Tape[Data, Delta]]
}
object DeepLearning {
type Aux[Differentiable, Data0, Delta0] = DeepLearning[Differentiable] {
type Data = Data0
type Delta = Delta0
}
}
由于DeepLearning
是个依赖类型的类型类,Data
与Delta
分别表示计算图的值类型与反向传播的导数类型。所以为Differentiable
召唤DeepLearning
实例时,可以在编译时求出Data
与Delta
。比如DeepLearning.scala内置插件提供了DeepLearning.Aux[INDArray, INDArray, INDArray]
、DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]
和DeepLearning.Aux[INDArrayWeight, INDArray, INDArray]
:
implicit def indArrayLiteralDeepLearning: DeepLearning.Aux[INDArray, INDArray, INDArray] = ???
implicit def indArrayLayerDeepLearning: DeepLearning.Aux[INDArrayLayer, INDArray, INDArray] = ???
implicit def indArrayWeightDeepLearning: DeepLearning.Aux[INDArrayWeight, INDArray, INDArray] = ???
那么召唤DeepLearning[INDArray]
、DeepLearning[INDArrayLayer]
或DeepLearning[INDArrayWeight]
都可以在编译时把Data
和Delta
推断为INDArray
。
val summonINDArrayDeepLearning = DeepLearning[INDArray]
type INDArrayData = summonINDArrayDeepLearning.Data
type INDArrayDelta = summonINDArrayDeepLearning.Delta
val summonINDArrayLayerDeepLearning = DeepLearning[INDArrayLayer]
type INDArrayLayerData = summonINDArrayLayerDeepLearning.Data
type INDArrayLayerDelta = summonINDArrayLayerDeepLearning.Delta
val summonINDArrayWeightDeepLearning = DeepLearning[INDArrayWeight]
type INDArrayWeightData = summonINDArrayWeightDeepLearning.Data
type INDArrayWeightDelta = summonINDArrayWeightDeepLearning.Delta
比如上面几行代码中,INDArrayData
、INDArrayDelta
、INDArrayLayerData
、INDArrayLayerDelta
、INDArrayWeightData
、INDArrayWeightDelta
都是INDArray
。
而假如要召唤DeepLearning[DoubleLayer]
,由于存在以下隐式值:
implicit def doubleLayerDeepLearning: DeepLearning.Aux[DoubleLayer, Double, Double] = ???
那么Data
和Delta
就会是Double
:
val summonDoubleLayerDeepLearning = DeepLearning[DoubleLayer]
type DoubleLayerData = summonDoubleLayerDeepLearning.Data
type DoubleLayerDelta = summonDoubleLayerDeepLearning.Delta
DeepLearning
类型类实现dot
有了DeepLearning
类型类之后,我们把dot
两个参数实现成泛型类型Operand0
、Operand1
,然后利用隐式参数DeepLearning.Aux
来证明它们是可差分的多维数组。
def dot[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
implicit
deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
): INDArrayLayer = {
val do0: Do[Tape[INDArray, INDArray]] = deeplearning0.forward(operand0)
val do1: Do[Tape[INDArray, INDArray]] = deeplearning1.forward(operand1)
???
}
这样一来,deeplearning0
和deeplearning1
要满足DeepLearning.Aux[Operand0, INDArray, INDArray]
类型的话,就只能是DeepLearning.Aux[INDArray, INDArray, INDArray]
、DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]
或者
DeepLearning.Aux[INDArrayWeight, INDArray, INDArray]
,那么也就把Operand0
和Operand1
限制为INDArray
、INDArrayLayer
或INDArrayWeight
了。
由于所有的DeepLearning
实例都实现了forward
方法,所以dot
内部可以统一把Operand0
和Operand1
转为Do[Tape[INDArray, INDArray]]
。
这样一来,dot
就可以在参数中同时支持各种多维数组类型,包括多维数组的计算图和多维数组的权重,然后统一处理了。
尽管我们的dot
可以支持以上九种签名,但有的时候还是不够。比如max
函数既可以支持多维数组之间的逐元素比较,也可以用来让多维数组和标量浮点比较,以便写出max(ndArray, 0.0)
实现ReLU
激活函数。
理想情况下,max
应该支持额外的四倍签名:
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
implicit
deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
): INDArrayLayer = ???
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
implicit
deeplearning0: DeepLearning.Aux[Operand0, Double, Double],
deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
): INDArrayLayer = ???
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
implicit
deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
deeplearning1: DeepLearning.Aux[Operand1, Double, Double],
): INDArrayLayer = ???
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
implicit
deeplearning0: DeepLearning.Aux[Operand0, Double, Double],
deeplearning1: DeepLearning.Aux[Operand1, Double, Double],
): DoubleLayer = ???
不幸的是,Scala编译器不支持这样的重载定义,有个两个原因:
Operand0
和Operand1
,也就没办法确定选用哪个重载函数了。我们用Shapeless
中Poly
来解决重载问题。
我们把max
定义为Poly2
:
object max extends Poly2
然后提供上述4个max.Case:
implicit def maxDoubleDouble[Operand0, Operand1](
implicit
deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
deepLearning1: DeepLearning.Aux[Operand1, Double, Double]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
???
}
implicit def maxDoubleINDArray[Operand0, Operand1](
implicit
deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
???
}
implicit def maxINDArrayDouble[Operand0, Operand1](
implicit
deepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
deepLearning1: DeepLearning.Aux[Operand1, Double, Double]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
???
}
implicit def maxINDArrayINDArray[Operand0, Operand1](
implicit
deepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
???
}
上面每一个Case
函数根据Operand0
和Operand1
是普通值、计算图还是权重,又可以展开成9种Case
。
最终就可以在调用max
时支持四九三十六种Case
,相当于36种签名。
比如:
val operand0: DoubleWeight = ???
val operand1: INDArrayLayer = ???
max(operand0, operand1)
在搜索到隐式参数之后,函数调用等价于:
max(operand0, operand1)(maxDoubleINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))
除了多态函数以外,DeepLearning.scala的内置插件中还提供了一些中缀操作的多态方法,比如四则运算。这些多态方法是通过转发到shapeless.Poly2
上实现的:
object + extends Poly2
object - extends Poly2
object * extends Poly2
object / extends Poly2
implicit final class PolymorphicOps[Operand0](operand0: Operand0) {
def +[Operand1](operand1: Operand1)(
implicit methodCase: +.Case[Operand0, Operand1]
): methodCase.Result = methodCase(operand0, operand1)
def -[Operand1](operand1: Operand1)(
implicit methodCase: -.Case[Operand0, Operand1]
): methodCase.Result = methodCase(operand0, operand1)
def *[Operand1](operand1: Operand1)(
implicit methodCase: *.Case[Operand0, Operand1]
): methodCase.Result = methodCase(operand0, operand1)
def /[Operand1](operand1: Operand1)(
implicit methodCase: /.Case[Operand0, Operand1]
): methodCase.Result = methodCase(operand0, operand1)
}
比如:
implicit doubleDivINDArray[Operand0, Operand1](
implicit
deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
) = /.at[Operand0, Operand1] { (operand0, operand1) =>
???
}
val operand0: DoubleWeight = ???
val operand1: INDArrayLayer = ???
operand0 / operand1
在搜索到隐式参数之后,函数调用等价于:
PolymorphicOps(operand0)./(operand1)(doubleDivINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))
通过类型类DeepLearning
和shapeless.Poly2
,我们支持了多态函数和多态方法。用这种方式实现的多态函数和多态方法具有扩展性,只要增加新的隐式值,就能支持同名函数的新签名。
和其他功能一样,本篇文章中介绍的隐式值也是可以由插件实现。我将在本系列的下一篇文章中揭示DeepLearning.scala插件系统的内部实现细节。届时你将发现,如此强大的插件系统,其核心部分却异常简单。