【阿里集团卜居深度解析】卷积神经网络的硬件加速-赢咖4注册

aihot 2017-06-18 22:40:52 深度学习 | 查看评论

4. 性能对比

　　通过几篇论文的结果【2】【3】【4】【9】我们来量化本文四种硬件加速方案。

　　该结果为文献【3】提供，其中 FPL 2009 为本文文献【2】的结果。

　　方案一、方案二无法直接对比 AlexNet 性能，只能通过每秒计算能力对比：5.25 GOPS vs 227 GOPS。

　　该结果为文献【9】提供，给出了每瓦性能对比情况，看到 FPGA 能效比高于 GPU、CPU 平台。

　　该结果为文献【4】提供，其中 Best prior CNN on Virtex 7 485T 为本文【3】的结果。

　　本文方案三和方案四均实现了 AlexNet 前向计算过程，性能分别为 46 images/s 和 134 images/s。同时看到 FPGA 每瓦性能相比 GPU 具有很大优势。

　　一些商业加速方案如AuvizDNN【11】也提供了针对 AlexNet 的处理性能：

　　可以预见，随着 FPGA 集成度、主频进一步提高，在 CNN 加速能力上会逐渐赶超 GPU，成为深度学习下一个爆发期的助推剂。

5. 阿里云高性能利器

　　阿里云 HPC 服务是于 2015 年 10月推出的面向高性能计算和深度学习的平台，目前已有大量计算密集型应用案例，涵盖语音识别、图像分类和检索、渲染、医疗成像、气象预测、物理仿真等领域。硬件平台采用高性能 Broadwell CPU、Tesla K40/M40 GPU。正在进行中的阿里 FPGA 项目基于 Intel Xeon + FPGA 平台【12】，CPU 与 FPGA 直接封装到同一个 package，具有更低的通信延迟，可满足灵活多变的应用热点加速场景。现已经针对语音、视频数据展开大量分析和处理。

参考文献

　　【1】Amos, Jagath. FPGA IMPLEMENTATIONS OF NEURAL NETWORKS. Springer 2006.

　　【2】Yann LeCun, et al. CNP : An FPGA-based Processor for Convolutional Networks. 2009.

　　【3】Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, 2015, ACM 978-1-4503-3315-3/15/02

　　【4】Accelerating Deep Convolutional Neural Networks Using Specialized Hardware, 2015.

　　【5】Stratix 10 Device Overview, Altera, 2015.12.

　　【6】 A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. NIPS 2012.

　　【7】 J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. ICANN 2014.

　　【8】 http://www.xilinx.com/products/boards-and-kits/ek-v7-vc707-g.html

　　【9】 A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks, 2013

　　【10】A. Putnam, et al., A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, International Symposium on Computer Architecture, 2014.

　　【11】http://auvizsystems.com/products/auvizdnn/

　　【12】http://www.eweek.com/servers/intel-begins-shipping-xeon-chips-with-fpga-accelerators.html

4/4 首页上一页 2 3 4