二维张量 乘以 三维张量_通量vs张量流误解

二维张量 乘以 三维张量

TensorFlow is the 800-pound Gorilla of Machine Learning that almost everybody in the field have heard about and have some familiarity with. But there is a tiny uppity little upstart called Flux which is kicking ass and taking names, causing it to grab some attention. Yet it is still a very misunderstood machine learning library.

TensorFlow是重达800磅的机器学习大猩猩,该领域的几乎每个人都听说过并熟悉。 但是,有一个小小的新贵,叫做Flux ,它正在踢屁股并取名,从而引起了人们的注意。 但是,它仍然是一个非常容易被误解的机器学习库。

Emmett Boudreau is fellow Julia fan here on medium has written a number of articles on the topic. One of his latest Is Flux Better Than TensorFlow made me realize reading the comments that people really don’t grasp what the big deal about Flux is.

埃米特·布德罗 ( Emmett Boudreau)是茱莉亚·范(Julia Fan)的同胞,他在媒体上写了许多有关该主题的文章。 他的最新作品之一是《比TensorFlow更好的通量》,使我意识到阅读这些评论,人们真的不了解Flux的重要意义。

If you are living and breathing Julia, there is a lot of things you take for granted. So here are some great questions which deserve answers.

如果您生活和呼吸着Julia,那么很多事情都是您想当然的。 因此,这里有一些值得回答的重要问题。

Julia Performance与Flux有何关系? (How is Julia Performance Relevant for Flux?)

William Heymann asks:

威廉·海曼(William Heymann)问:

Since tensorflow compiles to a graph and runs optimized code on a cpu, tpu or gpu. Does the language speed matter? From what I have seen tensorflow is the same performance in c++ and python

由于tensorflow会编译为图并在cpu,tpu或gpu上运行优化的代码。 语言速度重要吗? 从我看到的张量流是在c + +和python中相同的性能

This is true. Once you have a graph setup, TensorFlow will run fast. There isn’t much Julia and Flux can do to one up TensorFlow. But there is a number of assumptions baked into the framing of this question.

这是真的。 设置完图形后,TensorFlow将快速运行。 Julia(Julia)和通量(Flux)对TensorFlow所做的工作不多。 但是,在这个问题的框架中有许多假设。

TensorFlow is based on having graph nodes (ops) pre-made in high performance C++. The Python interface is really just used to shuffle around and organize these pre-made nodes into a graph, which is then executed.

TensorFlow基于高性能C ++中预先制作的图形节点(ops)。 实际上,Python接口只是用于将这些预制节点随机组合在一起并组织成一个图形,然后执行该图形。

If you need a particular kind of functionality which does not exist as a node already, then you have a couple of choices:

如果您需要某种尚不存在于节点上的特定功能,那么您有两种选择:

  • Try to mimic the needed functionality by composing existing nodes in Python.

    尝试通过在Python中组成现有节点来模仿所需的功能。
  • Write your own custom node (op) in C++, and to all the busy work of registering it with the TensorFlow system, to make it available from Python.

    使用C ++编写您自己的自定义节点(op) ,并完成在TensorFlow系统上进行注册的所有繁琐工作,以使其可从Python使用。

Contrast this with Flux, where you don’t assemble a graph made up of pre-made nodes. You basically just write regular Julia code. You have the whole Julia language is at your disposal. You can use if-statements, for-loops call any number of other functions. Use custom data types. Almost anything is possible.

与Flux相反,在Flux中,您不组装由预制节点组成的图形。 您基本上只是编写常规的Julia代码。 您拥有完整的Julia语言可供选择。 您可以使用if语句,for循环调用任意数量的其他函数。 使用自定义数据类型。 几乎所有可能。

Then you specify what part of this regular Julia function represent parameters that you want to tweak. You hand this function to Flux and tell Flux to train on this function feeding it inputs and checking the output. Flux then uses your chosen training algorithm to update parameters until your function produce the desired output (super simplified).

然后,指定此常规Julia函数的哪个部分表示要调整的参数。 您将此功能交给Flux,并告诉Flux对该功能进行培训,以提供其输入并检查输出。 然后,Flux使用您选择的训练算法来更新参数,直到您的函数产生所需的输出(超级简化)为止。

Of course this Julia function could be made up of a bunch of classic sigmoid functions arranged in classic neural networks. You can do some convolutions etc. But that is just one option. The function you hand to Flux can be anything. It could be a whole ray-tracer if you wanted, where training tweaks some parameters of the ray-tracer algorithm to give desired output.

当然,这个Julia函数可以由布置在经典神经网络中的一堆经典S型函数组成。 您可以进行一些卷积等。但这只是一种选择。 您交给Flux的功能可以是任何东西。 如果需要的话,它可以是整个光线跟踪器 ,在训练中可以调整光线跟踪器算法的某些参数以提供所需的输出。

The function could be some advance scientific model you already built. No problem, as long as you can specify the parameters to Flux.

该功能可能是您已经建立的一些高级科学模型。 没问题,只要您可以为Flux指定参数即可。

Basically the graph Flux deals with is the same abstract syntax tree graph of regular Julia code. You don’t have to build up a graph manually with all sorts of restrictions like in TensorFlow.

基本上,Flux处理的图是与常规Julia代码相同的抽象语法树图。 您无需像TensorFlow中那样通过各种限制手动构建图形。

If you have some scientific model in Python which you want to tweak through machine learning e.g. you cannot just feed it as-is to TensorFlow. No, you would have to go over this code and figure out a way to replace everything it does by building a complex graph in TensorFlow only using nodes supplied by TensorFlow. You don’t have the full power of Python available.

如果您有一些想要在Python中进行调整的科学模型,例如您不能仅将其按原样提供给TensorFlow。 不,您将不得不遍历此代码,并找到一种方法来替换它所做的所有事情,方法是仅使用TensorFlow提供的节点在TensorFlow中构建复杂的图形。 您没有Python的全部功能。

Let us expand on this question by answering Gershom Agim:

让我们通过回答Gershom Agim来扩展这个问题:

You mention Julia’s a faster language but show no examples of where/how that comes into play considering TF, numpy and the whole ML ecosystem are just monolithic wrappers around c++, fortran and cuda libraries

您提到了Julia是一种更快的语言,但没有考虑到TF,numpy和整个ML生态系统只是围绕c ++,fortran和cuda库的整体包装的示例,

Emmett Boudreau does indeed state that Julia is faster than Python and that may not seem to matter as you are just running C++ code in TensorFlow.

Emmett Boudreau确实声明了Julia比Python快,这似乎无关紧要,因为您只是在TensorFlow中运行C ++代码。

But as I elaborated on above, by having this two-language split you incur a lot of inflexibility. Every problem must be expressed in terms of nodes assembled into a graph. If some function does not exist you have to find a way to create it with existing nodes or go through all the hassle of doing this in C++.

但是,正如我在上面详细说明的那样,通过拆分这两种语言,您会产生很多不灵活性。 每个问题都必须按照组装到图中的节点来表达。 如果某些功能不存在,则必须找到一种使用现有节点创建该功能的方法,或者解决使用C ++进行此操作的所有麻烦。

Manually describing code for solving a machine learning problem by assembling a graph is not natural. Imagine writing regular code like that. Instead of writing say:

通过组装图形来手动描述用于解决机器学习问题的代码是不自然的。 试想像这样编写常规代码。 不要写:

a = 3
b = 4
c = a + b - 10

You would have to write something like this:

您将必须编写如下内容:

a = VariableNode("a")
b = VariableNode("b")
c = AddNode(a, SubNode(b, Constant(10)))
execute_graph(c, ["a" => 3, "b" => 4])

The latter is the a TensorFlow inspired way of approaching the problem. The former is a more Julia inspired way of writing code. You just write regular code. Julia is very much like LISP under the hood. Julia code can easily be manipulated as data. There is no need to invent a whole separate abstract syntax tree for machine learning purposes. You just use the Julia language itself as it is.

后者是TensorFlow启发的解决问题的方法。 前者是一种受Julia启发的编写代码的方式。 您只需编写常规代码。 Julia非常像引擎盖下的LISP。 Julia代码可以轻松地作为数据进行操作。 出于机器学习的目的,无需发明整个单独的抽象语法树。 您只需按原样使用Julia语言即可。

Because Julia is a high performance language. You can run your machine learning algorithms straight on pure Julia code. You don’t need a bunch of C++ code snippets glued together in a graph arranged by Python code.

因为Julia是一种高性能的语言。 您可以直接在纯Julia代码上运行机器学习算法。 您不需要在Python代码排列的图形中将一堆C ++代码片段粘合在一起。

助焊剂看起来很小且不完整 (Flux Looks Small and Incomplete)

On twitter Timothy Lau asked me this question.

在推特上,蒂莫西·刘问我这个问题。

I like flux but last I checked it seemed like it was stuck trying to catch up to the features and newer methods that keep getting added to pytorch and tensorflow.

我喜欢助焊剂,但最后我检查了一下,似乎赶上了不断添加到pytorch和tensorflow中的功能和更新的方法,似乎卡住了。

I had to do some follow up questions to clarify what he meant. One of his examples was a list of activation functions. You can see a list of activation functions for TensorFlow here. These are functions such as sigmoid, relu and softmax. In Flux at the time the list was very short, as with the list of many other types of functions in Julia.

我必须做一些后续问题来阐明他的意思。 他的例子之一是激活功能列表。 您可以在此处查看TensorFlow的激活功能列表。 这些功能包括sigmoidrelusoftmax 。 当时在Flux中,列表非常短,与Julia中许多其他类型的函数一样。

This made Timothy Lau conclude that Flux was incomplete and not ready for prime time. It lacked so many functions TensorFlow has.

这使Timothy Lau得出结论,Flux尚不完整,还没有准备好迎接黄金时段。 它缺乏TensorFlow拥有的众多功能。

The problem is that this is actually not an apples to apples comparison. If you look at the current list of activation functions in Flux, you will notice that they are not actually from Flux at all but from another library called NNlib.

问题在于,这实际上不是苹果与苹果的比较。 如果查看Flux中当前的激活函数列表,您会发现它们实际上根本不是来自Flux,而是来自另一个名为NNlib的库。

And this is where things get interesting, NNlib is a generic library containing activation functions. It was not made specifically for Flux. Here is an example of how the relu function has been defined in NNlib:

这就是使事情变得有趣的地方,NNlib是包含激活函数的通用库。 它不是专门为Flux制造的。 这是一个如何在relu定义relu函数的示例:

relu(x) = max(zero(x), x)

There is nothing Flux specific about this code. It is just plain Julia code. In fact this is so trivial you could have written it yourself. This is in significant contrast to TensorFlow activation functions, which must be part of the TensorFlow library. That is because these are nodes (or ops as TF calls them) written in C++ which must adhere to a specific interface that TensorFlow expects. Otherwise these activation function cannot be used in a TensorFlow graph.

此代码没有特定于Flux的内容。 这只是普通的Julia代码。 实际上,这是如此琐碎,您可以自己编写。 这与TensorFlow激活函数形成鲜明对比,后者必须是TensorFlow库的一部分。 那是因为这些是用C ++编写的节点(或TF称为op的节点),必须遵守TensorFlow期望的特定接口。 否则,这些激活功能无法在TensorFlow图中使用。

That means that e.g. PyTorch activation functions have to be reimplemented to fit the interfaces PyTorch expects. The net effect is that in the Python world one often ends up with massive libraries, because you need to reimplement the same things over and over again.

这意味着例如必须重新实现PyTorch激活功能以适合PyTorch期望的接口。 最终的结果是,在Python世界中,最终往往会拥有大量的库,因为您需要一遍又一遍地重新实现相同的东西。

In the Julia world in contrast the same activation functions can be implemented once in one tiny library such as NNlib and reuse in any number of Julia machine learning libraries including Flux. The net effect of this is that Julia libraries tend to be very small. They don’t have to be big, because a lot of the functionality you need for any given workflow comes from other Julia libraries. Julia libraries are extremely composable. That means you can replicate what Python does with one massive library by simply combining a bunch of small well defined libraries.

相反,在Julia世界中,相同的激活函数可以在一个小库(例如NNlib)中实现一次,并可以在包括Flux在内的任意数量的Julia机器学习库中重用。 这样做的最终结果是Julia库往往很小。 它们不必很大,因为任何给定工作流程所需的许多功能都来自其他Julia库。 Julia库是非常可组合的。 这意味着您可以通过简单地组合一堆定义良好的小型库来复制Python对一个大型库所做的工作。

For instance running on a GPU, using preallocated arrays, or running on a tensor processing unit is handled all with entirely separate libraries. It is not built into Flux. These other libraries aren’t even made specifically to give that functionality to Flux. They are generic libraries.

例如,在GPU上运行,使用预分配的数组或在张量处理单元上运行都可以通过完全独立的库来处理。 它没有内置在Flux中。 这些其他库甚至没有专门为Flux提供该功能。 它们是通用库。

Thus you cannot compare Flux alone to TensorFlow. You got to compare TensorFlow to basically the whole Julia eco-system of packages. This may give you some sense of why Julia is rapidly gaining such enthusiastic adherents. Highly composable micro-libraries is extremely powerful.

因此,您无法将Flux单独与TensorFlow进行比较。 您必须将TensorFlow与基本上整个包装的Julia生态系统进行比较。 这可能使您对Julia为何Swift获得如此热情的支持者有所了解。 高度可组合的微库非常强大。

If you want a more concrete hands on example of how you use Flux, you can read one of my Flux introductions where we do hand writing recognition with Flux.

如果您想更具体地了解如何使用Flux的示例,可以阅读我的Flux简介之一,其中我们使用Flux进行手写识别。

And if how Flux works is still a mystery to you, then I advice you to read this article, where I go more into detail of how Flux is implemented.

如果是如何工作的流量仍然是个谜你,那么我建议你阅读文章,我走到哪里更到的流量是如何实现的细节

翻译自: https://medium.com/@Jernfrost/flux-vs-tensorflow-misconceptions-2737a8b464fb

二维张量 乘以 三维张量