[关闭]
@atry 2017-11-16T10:04:53.000000Z 字数 8053 阅读 1309

神经网络与函数式编程(六)依赖类型的类型类和多态函数

神经网络与函数式编程

在本系列的前几篇文章中,我们学习了深度学习和函数式编程的对应关系,以及如何用DeepLearning.scala创建函数式风格的神经网络。你可能会好奇,DeepLearning.scala是如何提供这些能力的。在接下来几篇文章中,我将揭示DeepLearning.scala如何实现以下这些功能的内部细节:

  1. 多态函数
  2. 反向传播
  3. 插件

在本篇文章中,我们将首先谈谈多态函数。


动机

DeepLearning.scala中内置了矩阵乘法的函数dotdot接受两个多维数组INDArray作为参数,返回一个多维数组的计算图,比如可以这样用:

  1. val ndArray1: INDArray = ???
  2. val ndArray2: INDArray = ???
  3. val ndArrayLayer: INDArrayLayer = dot(ndArray1, ndArray2)

如果用这个dot函数实现全连接层的话,两个参数中有一个会是权重INDArrayWeight,比如:

  1. val x: INDArray = ???
  2. val w: INDArrayWeight = ???
  3. val y: INDArrayLayer = dot(x, w)

此外,通常神经网络有多层,除了第一层以外,其他层的输入都是上一层的输出,那么这种情况下,dot的两个参数中还会有一个是其他层输出的计算图INDArrayLayer,比如:

  1. val x1: INDArray = ???
  2. val w1: INDArrayWeight = ???
  3. val x2: INDArrayLayer = dot(x1, w1)
  4. val w2: INDArrayWeight = ???
  5. val y: INDArrayLayer = dot(x2, w2)

结果就是,我们需要定义一个dot函数,能支持以上所有用法,就必须能支持各种不同的参数类型。
理想情况下,两个参数都应该支持INDArrayINDArrayLayerINDArrayWeight三个类型,排列起来有九种签名:

  1. def dot(operand0: INDArray, operand0: INDArray): INDArrayLayer
  2. def dot(operand0: INDArrayLayer, operand0: INDArray): INDArrayLayer
  3. def dot(operand0: INDArrayWeight, operand0: INDArray): INDArrayLayer
  4. def dot(operand0: INDArray, operand0: INDArrayLayer): INDArrayLayer
  5. def dot(operand0: INDArrayLayer, operand0: INDArrayLayer): INDArrayLayer
  6. def dot(operand0: INDArrayWeight, operand0: INDArrayLayer): INDArrayLayer
  7. def dot(operand0: INDArray, operand0: INDArrayWeight): INDArrayLayer
  8. def dot(operand0: INDArrayLayer, operand0: INDArrayWeight): INDArrayLayer
  9. def dot(operand0: INDArrayWeight, operand0: INDArrayWeight): INDArrayLayer

如果要重载这么多函数的话,就太过冗余了。

DeepLearning类型类

我们的做法是定义一个
DeepLearningAux模式的依赖类型的类型类,其中利用simulacrum生成繁琐的boilerplate代码:

  1. @simulacrum.typeclass
  2. trait DeepLearning[Differentiable] {
  3. type Data
  4. type Delta
  5. def forward(differentiable: Differentiable): Do[Tape[Data, Delta]]
  6. }
  7. object DeepLearning {
  8. type Aux[Differentiable, Data0, Delta0] = DeepLearning[Differentiable] {
  9. type Data = Data0
  10. type Delta = Delta0
  11. }
  12. }

由于DeepLearning是个依赖类型的类型类,DataDelta分别表示计算图的值类型与反向传播的导数类型。所以为Differentiable召唤DeepLearning实例时,可以在编译时求出DataDelta。比如DeepLearning.scala内置插件提供了DeepLearning.Aux[INDArray, INDArray, INDArray]DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]DeepLearning.Aux[INDArrayWeight, INDArray, INDArray]

  1. implicit def indArrayLiteralDeepLearning: DeepLearning.Aux[INDArray, INDArray, INDArray] = ???
  2. implicit def indArrayLayerDeepLearning: DeepLearning.Aux[INDArrayLayer, INDArray, INDArray] = ???
  3. implicit def indArrayWeightDeepLearning: DeepLearning.Aux[INDArrayWeight, INDArray, INDArray] = ???

那么召唤DeepLearning[INDArray]DeepLearning[INDArrayLayer]DeepLearning[INDArrayWeight]都可以在编译时把DataDelta推断为INDArray

  1. val summonINDArrayDeepLearning = DeepLearning[INDArray]
  2. type INDArrayData = summonINDArrayDeepLearning.Data
  3. type INDArrayDelta = summonINDArrayDeepLearning.Delta
  4. val summonINDArrayLayerDeepLearning = DeepLearning[INDArrayLayer]
  5. type INDArrayLayerData = summonINDArrayLayerDeepLearning.Data
  6. type INDArrayLayerDelta = summonINDArrayLayerDeepLearning.Delta
  7. val summonINDArrayWeightDeepLearning = DeepLearning[INDArrayWeight]
  8. type INDArrayWeightData = summonINDArrayWeightDeepLearning.Data
  9. type INDArrayWeightDelta = summonINDArrayWeightDeepLearning.Delta

比如上面几行代码中,INDArrayDataINDArrayDeltaINDArrayLayerDataINDArrayLayerDeltaINDArrayWeightDataINDArrayWeightDelta都是INDArray

而假如要召唤DeepLearning[DoubleLayer],由于存在以下隐式值:

  1. implicit def doubleLayerDeepLearning: DeepLearning.Aux[DoubleLayer, Double, Double] = ???

那么DataDelta就会是Double

  1. val summonDoubleLayerDeepLearning = DeepLearning[DoubleLayer]
  2. type DoubleLayerData = summonDoubleLayerDeepLearning.Data
  3. type DoubleLayerDelta = summonDoubleLayerDeepLearning.Delta

利用DeepLearning类型类实现dot

有了DeepLearning类型类之后,我们把dot两个参数实现成泛型类型Operand0Operand1,然后利用隐式参数DeepLearning.Aux来证明它们是可差分的多维数组。

  1. def dot[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  2. implicit
  3. deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  4. deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
  5. ): INDArrayLayer = {
  6. val do0: Do[Tape[INDArray, INDArray]] = deeplearning0.forward(operand0)
  7. val do1: Do[Tape[INDArray, INDArray]] = deeplearning1.forward(operand1)
  8. ???
  9. }

这样一来,deeplearning0deeplearning1要满足DeepLearning.Aux[Operand0, INDArray, INDArray]类型的话,就只能是DeepLearning.Aux[INDArray, INDArray, INDArray]DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]或者
DeepLearning.Aux[INDArrayWeight, INDArray, INDArray],那么也就把Operand0Operand1限制为INDArrayINDArrayLayerINDArrayWeight了。

由于所有的DeepLearning实例都实现了forward方法,所以dot内部可以统一把Operand0Operand1转为Do[Tape[INDArray, INDArray]]

这样一来,dot就可以在参数中同时支持各种多维数组类型,包括多维数组的计算图和多维数组的权重,然后统一处理了。

多态方法

尽管我们的dot可以支持以上九种签名,但有的时候还是不够。比如max函数既可以支持多维数组之间的逐元素比较,也可以用来让多维数组和标量浮点比较,以便写出max(ndArray, 0.0)实现ReLU激活函数。

理想情况下,max应该支持额外的四倍签名:

  1. def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  2. implicit
  3. deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  4. deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
  5. ): INDArrayLayer = ???
  6. def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  7. implicit
  8. deeplearning0: DeepLearning.Aux[Operand0, Double, Double],
  9. deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
  10. ): INDArrayLayer = ???
  11. def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  12. implicit
  13. deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  14. deeplearning1: DeepLearning.Aux[Operand1, Double, Double],
  15. ): INDArrayLayer = ???
  16. def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  17. implicit
  18. deeplearning0: DeepLearning.Aux[Operand0, Double, Double],
  19. deeplearning1: DeepLearning.Aux[Operand1, Double, Double],
  20. ): DoubleLayer = ???

不幸的是,Scala编译器不支持这样的重载定义,有个两个原因:

  1. 这四个函数的签名在类型擦除之后都一样,导致生成的Java字节码冲突。
  2. Scala编译器必须在隐式参数搜索之前确定调用哪个重载函数,而上述四个重载函数在搜索到隐式参数以前都无法确定Operand0Operand1,也就没办法确定选用哪个重载函数了。

我们用ShapelessPoly来解决重载问题。

我们把max定义为Poly2

  1. object max extends Poly2

然后提供上述4个max.Case

  1. implicit def maxDoubleDouble[Operand0, Operand1](
  2. implicit
  3. deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
  4. deepLearning1: DeepLearning.Aux[Operand1, Double, Double]
  5. ) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  6. ???
  7. }
  8. implicit def maxDoubleINDArray[Operand0, Operand1](
  9. implicit
  10. deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
  11. deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
  12. ) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  13. ???
  14. }
  15. implicit def maxINDArrayDouble[Operand0, Operand1](
  16. implicit
  17. deepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  18. deepLearning1: DeepLearning.Aux[Operand1, Double, Double]
  19. ) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  20. ???
  21. }
  22. implicit def maxINDArrayINDArray[Operand0, Operand1](
  23. implicit
  24. deepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  25. deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
  26. ) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  27. ???
  28. }

上面每一个Case函数根据Operand0Operand1是普通值、计算图还是权重,又可以展开成9种Case

最终就可以在调用max时支持四九三十六种Case,相当于36种签名。

比如:

  1. val operand0: DoubleWeight = ???
  2. val operand1: INDArrayLayer = ???
  3. max(operand0, operand1)

在搜索到隐式参数之后,函数调用等价于:

  1. max(operand0, operand1)(maxDoubleINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))

多态方法

除了多态函数以外,DeepLearning.scala的内置插件中还提供了一些中缀操作的多态方法,比如四则运算。这些多态方法是通过转发到shapeless.Poly2上实现的:

  1. object + extends Poly2
  2. object - extends Poly2
  3. object * extends Poly2
  4. object / extends Poly2
  5. implicit final class PolymorphicOps[Operand0](operand0: Operand0) {
  6. def +[Operand1](operand1: Operand1)(
  7. implicit methodCase: +.Case[Operand0, Operand1]
  8. ): methodCase.Result = methodCase(operand0, operand1)
  9. def -[Operand1](operand1: Operand1)(
  10. implicit methodCase: -.Case[Operand0, Operand1]
  11. ): methodCase.Result = methodCase(operand0, operand1)
  12. def *[Operand1](operand1: Operand1)(
  13. implicit methodCase: *.Case[Operand0, Operand1]
  14. ): methodCase.Result = methodCase(operand0, operand1)
  15. def /[Operand1](operand1: Operand1)(
  16. implicit methodCase: /.Case[Operand0, Operand1]
  17. ): methodCase.Result = methodCase(operand0, operand1)
  18. }

比如:

  1. implicit doubleDivINDArray[Operand0, Operand1](
  2. implicit
  3. deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
  4. deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
  5. ) = /.at[Operand0, Operand1] { (operand0, operand1) =>
  6. ???
  7. }
  8. val operand0: DoubleWeight = ???
  9. val operand1: INDArrayLayer = ???
  10. operand0 / operand1

在搜索到隐式参数之后,函数调用等价于:

  1. PolymorphicOps(operand0)./(operand1)(doubleDivINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))

结论

通过类型类DeepLearningshapeless.Poly2,我们支持了多态函数和多态方法。用这种方式实现的多态函数和多态方法具有扩展性,只要增加新的隐式值,就能支持同名函数的新签名。

和其他功能一样,本篇文章中介绍的隐式值也是可以由插件实现。我将在本系列的下一篇文章中揭示DeepLearning.scala插件系统的内部实现细节。届时你将发现,如此强大的插件系统,其核心部分却异常简单。

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注