深度學習的IR「之爭」

新聞 09-13

深度學習的IR「之爭」

新智元推薦

深度學習的IR「之爭」

上周，我們看到這樣的新聞「Facebook and Microsoft introduce new open ecosystem for interchangeable AI frameworks」。這也讓Framework之爭更加熱鬧。簡單來說，ONNX也是為了解決目前多個Framework互操作的問題。但有趣的是，這個「開放」的系統看起來更像是微軟和FB連合對抗Google。目前Tensorflow的佔有率已經領先不少，其它的Framework肯定也不希望看到Tensorflow一家獨大，畢竟Framework是做deep learning的一個「入口」。最近PyTorch的勢頭不錯，Caffe2, PyTorch和Cognitive Toolkit通過這種方式「聯合」，似乎也是個不錯的選擇。

「An Intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code.An IR is designed to be conducive for further processing, such as optimization and translation. A "good" IR must beaccurate– capable of representing the source code without loss of information – andindependentof any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.」 - Wikipedia

我們還是從目前Deep Learning的一個現實問題說起吧。

深度學習的IR「之爭」

上圖來自介紹NNVM的一篇文章[1]。文中在談到NNVM的目標的時候，是這麼說的：

「This is a new interesting era of deep learning, with emergence trend of new system, hardware and computational model. The usecase for deep learning is more heterogeneous, and we need tailored learning system for our cars, mobiles and cloud services. The future of deep learning system is going to be more heterogeneous, and we will findemergence need of different front-ends, backends and optimization techniques. Instead of building a monolithic solution to solve all these problems, how about adopt unix philosophy, build effective modules for learning system, and assemble them together to build minimum and effective systems?」

簡單來說，現在Deep Learning有這麼多不同前端（framework），有這麼多不同的後端（hardware），是否能找到一個橋樑更有效實現他們之間的優化和影射呢？

實際上這個問題並不新鮮。當年，隨著不同的應用場景和需求，出現了大量不同的編程語言和不同的處理器架構，軟體產業也遇到過類似的問題。

深度學習的IR「之爭」

換句話說，這也正是重演了LLVM出現時的場景：大量不同的編程語言和越來越多的硬體架構之間需要一個橋樑。LLVM的出現，讓不同的前端後端使用統一的 LLVM IR ,如果需要支持新的編程語言或者新的設備平台，只需要開發對應的前端和後端即可。同時基於 LLVM IR 我們可以很快的開發自己的編程語言。比如，LLVM創建者Chris Lattner後來加入了Apple，又創建了Swift語言，可以看作是LLVM的前端。

深度學習的IR「之爭」

由此也可以看出，LLVM統一的IR是它成功的關鍵之一，也充分說明了一個優秀IR的重要性。

當然，IR本質上是一種中間表示形式，是一個完整編譯工具的一部分。而我們下面討論的TVM，XLA都是圍繞特定IR構建的優化和編譯工具。

陳天奇在另一篇文章中提到：「...對於深度學習，我們需要類似的項目。學習 LLVM 的思想，我們將其取名 NNVM」。(2016年10月)

8月17號，陳天奇的團隊又發布了TVM：An End to End IR Stack for Deploying the Deep Learning Workloads to Hardwares[2]，其架構如下圖所示：

深度學習的IR「之爭」

We adopt a common philosophy from the compiler communityandprovide two intermediate representation layersto efficiently lower high-level deep learning algorithms down to a multitude of hardware back-ends.

可以看出，他們在之前的NNVM之外上增加了一個新的IR Stack，TVM，試圖解決下圖所示的Gap,「A lot of powerful optimizations can be supported by the graph optimization framework. ...However we find that the computational graph based IR alone is not enough to solve the challenge of supporting different hardware backends.」這裡的graph based IR則是指NNVM。

深度學習的IR「之爭」

我們知道，在LLVM環境中，只有一個統一的IR。那麼，為什麼Deep Learning環境中graph based IR還不夠呢？在隨後的一篇知乎文章中[3]，陳天奇提到了去年10月知乎上關於「如何評價陳天奇的模塊化深度學習系統NNVM？」的討論[4]。而這個討論中王健飛的回答似乎是TVM產生的靈感之一。

同樣在這篇文章當中，陳天奇還提到，「TVM和已有的解決方案不同，以XLA作為例子，TVM走了和目前的XLA比更加激進的技術路線，TVM可以用來使得實現XLA需要的功能更加容易」。

既然TVM的作者點了對手的名，我們就來看看Google的XLA吧。

XLA (Accelerated Linear Algebra) is a domain-specific compilerfor linear algebra that optimizes TensorFlow computations. The results are improvements in speed, memory usage, and portability on server and mobile platforms. Initially, most users will not see large benefits from XLA, but are welcome to experiment by using XLA via just-in-time (JIT) compilation or ahead-of-time (AOT) compilation. Developers targeting new hardware accelerators are especially encouraged to try out XLA.

下圖左半部分來自「2017 EuroLLVM Deveopers』 Meeting」上的一個報告[6]，比較清楚介紹了XLA的目標，其基本功能也是優化和代碼生成。

深度學習的IR「之爭」

XLA具體的架構如圖右半部分所示，可以看出，它也是兩層優化的結構[5]，使用LLVM用作low-level IR, optimization, and code-generation。由於使用了LLVM IR, 他可以比較容易的支持不同的後端（Backend）。下圖就是使用GPU Backend的例子。

深度學習的IR「之爭」

對於目前不直接支持的後端，XLA給出了三種場景的開發方法。包括：

1. Existing CPU architecture not yet officially supported by XLA, with or without an existing LLVM backend.
2. Non-CPU-like hardware with an existing LLVM backend.
3. Non-CPU-like hardware without an existing LLVM backend.

總的來說，XLA和TVM試圖解決的問題類似。但XLA只是針對Google的Tensorflow的。而TVM/NNVM雖然是MxNe陣營，但試圖作為一個開發和公共的介面。

這裡插播一個新聞，Chris Lattner最近加入了Google Brain。雖然還不知道他的主要工作是不是會放在XLA這裡，但是他和Jeff Dean配合，確實是令人生畏。

由於我自己並沒有親自使用過這兩個工具，所以也不能給出更準確的評價和對比。對具體細節感興趣的讀者可以好好看看Reference的內容，並且親自嘗試一下。

其實，類似的想法還包括：Intel』s NGraph（如下圖），HP的Cognitive Computing Toolkit (CCT)， IBM的SystemML。

深度學習的IR「之爭」

而在剛剛結束的Hot Chips會議上，Microsoft發布了Project Brainwave，Cloud的AI FPGA加速平台。它的工具鏈是這樣的，是不是又看到了兩層IR？

深度學習的IR「之爭」

最後，最近還看到另一個有趣的嘗試：Khronos Neural Network Exchange Format (NNEF)，試圖定義一種標準的數據交換格式。「The NNEF standard encapsulates neural network structure, data formats, commonly used operations (such as convolution, pooling, normalization, etc.) and formal network semantics. 」

深度學習的IR「之爭」

T.S.：

隨著Deep Learning的應用越來越廣，大家越來越關心DNN在不同硬體架構上Training和Inference的實現效率。參考傳統編譯器（compiler）設計的經驗，XLA和TVM/NNVM都開始了很好的嘗試。而「IR」的競爭，將是未來Framework之爭的重要一環。

Reference：

[1]陳天奇, "Build your own TensorFlow with NNVM and Torch", http://tqchen.github.io/2016/10/01/build-your-own-tensorflow-with-nnvm-and-torch.html

[2]陳天奇, "TVM: An End to End IR Stack for Deploying the Deep Learning Workloads to Hardwares",http://tvmlang.org/2017/08/17/tvm-release-announcement.html

[3]陳天奇, "如何評價陳天奇團隊新開源的TVM？", https://www.zhihu.com/question/64091792/answer/217722459

[4] 王健飛，「如何評價陳天奇的模塊化深度學習系統NNVM？」，https://www.zhihu.com/question/51216952/answer/124708405

[5]"XLA Overview", https://www.tensorflow.org/performance/xla/

[6]"2017 EuroLLVM Developers』 Meeting: D. Majnemer 「XLA: Accelerated Linear Algebra」",https://www.youtube.com/watch?v=2IOPpyyuLkc

[7]"A Brief Introduction to LLVM", https://www.youtube.com/watch?v=a5-WaD8VV38"

[8]"XLA: TensorFlow Compiled!",https://www.youtube.com/watch?v=kAOanJczHA0

點擊閱讀原文可查看職位詳情，期待你的加入~

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 新智元 的精彩文章:

※攜A11人工智慧晶元登場，人臉識別9大特徵
※截胡谷歌：全球首款量產無人駕駛汽車發布，通用汽車搶灘L4
※像訓練CNN一樣快速訓練RNN：全新RNN實現，比優化後的LSTM快10倍
※「截胡谷歌」全球首款量產無人駕駛汽車發布，通用汽車搶灘L4

TAG:新智元 |