Skip to content

Commit f07815e

Browse files
authored
format the reference (microsoft#225)
1 parent 9949293 commit f07815e

16 files changed

+175
-91
lines changed

Textbook/第11章-模型压缩与加速/11.1-模型压缩简介.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,11 @@ NAS 搜索出的网络结构在某些任务上甚至可以达到媲美人类专
224224

225225

226226
1. [A100](https://www.nvidia.com/en-us/data-center/a100/)
227+
227228
2. [TensorCore](https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/)
229+
228230
3. Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. [Distilling the knowledge in a neural network.](https://arxiv.org/pdf/1503.02531.pdf) arXiv preprint arXiv:1503.02531 2.7 (2015).
231+
229232
4. Howard, Andrew G., et al. [Mobilenets: Efficient convolutional neural networks for mobile vision applications.](https://arxiv.org/pdf/1704.04861.pdf) arXiv preprint arXiv:1704.04861 (2017).
233+
230234
5. Deng, Lei, et al. [Model compression and hardware acceleration for neural networks: A comprehensive survey.](https://ieeexplore.ieee.org/abstract/document/9043731) Proceedings of the IEEE 108.4 (2020): 485-532.

Textbook/第11章-模型压缩与加速/11.2-基于稀疏化的模型压缩.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -167,16 +167,25 @@ $$APoZ^{(i)}_c = APoZ(O_c^{(i)}) = \frac{\sum_k^N \sum_j^M f(O^{(i)}_{c,j}(k=0))
167167

168168
## 参考文献
169169

170-
- Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, 2008, 31(2): 210-227.
171-
- 纪荣嵘,林绍辉,晁飞,吴永坚,黄飞跃.深度神经网络压缩与加速综述.计算机研究与发展,2018,55(09):1871-1888.
172-
- Hoefler T, Alistarh D, Ben-Nun T, et al. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 2021, 22(241): 1-124.
173-
- Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
174-
- Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. Proceedings of the IEEE international conference on computer vision. 2017: 2736-2744.
175-
- Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
176-
- Ren M, Pokrovsky A, Yang B, et al. Sbnet: Sparse blocks network for fast inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8711-8720.
177-
- Aji A F, Heafield K. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.
178-
- Lin Y, Han S, Mao H, et al. Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887, 2017.
179-
- Deng L, Li G, Han S, et al. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE, 2020, 108(4): 485-532.
170+
1. Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, 2008, 31(2): 210-227.
171+
172+
2. 纪荣嵘,林绍辉,晁飞,吴永坚,黄飞跃.深度神经网络压缩与加速综述.计算机研究与发展,2018,55(09):1871-1888.
173+
174+
3. Hoefler T, Alistarh D, Ben-Nun T, et al. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 2021, 22(241): 1-124.
175+
176+
4. Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
177+
178+
5. Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. Proceedings of the IEEE international conference on computer vision. 2017: 2736-2744.
179+
180+
6. Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
181+
182+
7. Ren M, Pokrovsky A, Yang B, et al. Sbnet: Sparse blocks network for fast inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8711-8720.
183+
184+
8. Aji A F, Heafield K. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.
185+
186+
9. Lin Y, Han S, Mao H, et al. Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887, 2017.
187+
188+
10. Deng L, Li G, Han S, et al. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE, 2020, 108(4): 485-532.
180189

181190

182191

Textbook/第11章-模型压缩与加速/11.3-模型压缩与硬件加速.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,5 +115,6 @@ TPU的推理芯片中很早就使用了INT8,在后续的训练芯片中也采
115115

116116
## 参考文献
117117

118-
- https://www.nvidia.com/en-us/data-center/a100/
119-
- https://www.jiqizhixin.com/articles/2019-06-25-18
118+
1. https://www.nvidia.com/en-us/data-center/a100/
119+
120+
2. https://www.jiqizhixin.com/articles/2019-06-25-18

Textbook/第13章-人工智能优化计算机系统/13.1-简介与趋势.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,5 +26,6 @@
2626

2727
## 参考文献
2828

29-
- Docker (software). [https://en.wikipedia.org/wiki/Docker_(software)](https://en.wikipedia.org/wiki/Docker_(software))
30-
- Kubernetes. [https://en.wikipedia.org/wiki/Kubernetes](https://en.wikipedia.org/wiki/Kubernetes)
29+
1. Docker (software). [https://en.wikipedia.org/wiki/Docker_(software)](https://en.wikipedia.org/wiki/Docker_(software))
30+
31+
2. Kubernetes. [https://en.wikipedia.org/wiki/Kubernetes](https://en.wikipedia.org/wiki/Kubernetes)

Textbook/第13章-人工智能优化计算机系统/13.2-学习增强系统的应用.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -125,14 +125,24 @@ Resource Central 的架构如上图所示,它包含线下(Offline)和 客
125125

126126
## 参考文献
127127

128-
- Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat Prabhat, and Ryan P. Adams. 2015. [*Scalable Bayesian optimization using deep neural networks*](https://dl.acm.org/doi/10.5555/3045118.3045349). In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15).
129-
- Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. [*Hey, You Have Given Me Too Many Knobs!: Understanding and Dealing with Over-Designed Configuration in System Software*](https://dl.acm.org/doi/10.1145/2786805.2786852). In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '15). Association for Computing Machinery.
130-
- Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. [*Neural Adaptive Video Streaming with Pensieve*](https://dl.acm.org/doi/10.1145/3098822.3098843). In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). Association for Computing Machinery.
131-
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. [*Automatic Database Management System Tuning Through Large-scale Machine Learning*](https://dl.acm.org/doi/10.1145/3035918.3064029). In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery.
132-
- Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. [*CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics*](https://dl.acm.org/doi/10.5555/3154630.3154669). In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association.
133-
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. [*The Case for Learned Index Structures*](https://dl.acm.org/doi/10.1145/3183713.3196909). In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery.
134-
- Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. [*Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms*](https://dl.acm.org/doi/abs/10.1145/3132747.3132772). In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). Association for Computing Machinery.
135-
- Zhao Lucis Li, Chieh-Jan Mike Liang, Wenjia He, Lianjie Zhu, Wenjun Dai, Jin Jiang, and Guangzhong Sun. 2018. [*Metis: Robustly Optimizing Tail Latencies of Cloud Systems*](https://dl.acm.org/doi/10.5555/3277355.3277449). In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (ATC '18). USENIX Association.
136-
- Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. [*ALEX: An Updatable Adaptive Learned Index*](https://doi.org/10.1145/3318464.3389711). In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery.
137-
- Mirhoseini, A., Goldie, A., Yazgan, M. et al. A graph placement methodology for fast chip design. Nature 594, 207–212 (2021). https://doi.org/10.1038/s41586-021-03544-w
138-
- Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. [*APEX: a high-performance learned index on persistent memory*](https://doi.org/10.14778/3494124.3494141). Proc. VLDB Endow.
128+
1. Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat Prabhat, and Ryan P. Adams. 2015. [*Scalable Bayesian optimization using deep neural networks*](https://dl.acm.org/doi/10.5555/3045118.3045349). In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15).
129+
130+
2. Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. [*Hey, You Have Given Me Too Many Knobs!: Understanding and Dealing with Over-Designed Configuration in System Software*](https://dl.acm.org/doi/10.1145/2786805.2786852). In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '15). Association for Computing Machinery.
131+
132+
3. Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. [*Neural Adaptive Video Streaming with Pensieve*](https://dl.acm.org/doi/10.1145/3098822.3098843). In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). Association for Computing Machinery.
133+
134+
4. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. [*Automatic Database Management System Tuning Through Large-scale Machine Learning*](https://dl.acm.org/doi/10.1145/3035918.3064029). In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery.
135+
136+
5. Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. [*CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics*](https://dl.acm.org/doi/10.5555/3154630.3154669). In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association.
137+
138+
6. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. [*The Case for Learned Index Structures*](https://dl.acm.org/doi/10.1145/3183713.3196909). In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery.
139+
140+
7. Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. [*Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms*](https://dl.acm.org/doi/abs/10.1145/3132747.3132772). In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). Association for Computing Machinery.
141+
142+
8. Zhao Lucis Li, Chieh-Jan Mike Liang, Wenjia He, Lianjie Zhu, Wenjun Dai, Jin Jiang, and Guangzhong Sun. 2018. [*Metis: Robustly Optimizing Tail Latencies of Cloud Systems*](https://dl.acm.org/doi/10.5555/3277355.3277449). In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (ATC '18). USENIX Association.
143+
144+
9. Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. [*ALEX: An Updatable Adaptive Learned Index*](https://doi.org/10.1145/3318464.3389711). In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery.
145+
146+
10. Mirhoseini, A., Goldie, A., Yazgan, M. et al. A graph placement methodology for fast chip design. Nature 594, 207–212 (2021). https://doi.org/10.1038/s41586-021-03544-w
147+
148+
11. Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. [*APEX: a high-performance learned index on persistent memory*](https://doi.org/10.14778/3494124.3494141). Proc. VLDB Endow.

Textbook/第13章-人工智能优化计算机系统/13.3-学习增强系统的落地挑战.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@
5353

5454
## 参考文献
5555

56-
- D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. [*Hidden Technical Debt in Machine Learning Systems*](https://dl.acm.org/doi/10.5555/2969442.2969519). In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press.
57-
- Zhao Lucis Li, Chieh-Jan Mike Liang, Wei Bai, Qiming Zheng, Yongqiang Xiong, and Guangzhong Sun. 2019. [*Accelerating Rule-matching Systems with Learned Rankers*](https://www.usenix.org/conference/atc19/presentation/li-zhao). In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (ATC '19). USENIX Association.
58-
- Chieh-Jan Mike Liang, Hui Xue, Mao Yang, Lidong Zhou, Lifei Zhu, Zhao Lucis Li, Zibo Wang, Qi Chen, Quanlu Zhang, Chuanjie Liu, and Wenjun Dai. 2020. [*AutoSys: The Design and Operation of Learning-Augmented Systems*](https://dl.acm.org/doi/abs/10.5555/3489146.3489168). In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC '20). USENIX Association.
56+
1. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. [*Hidden Technical Debt in Machine Learning Systems*](https://dl.acm.org/doi/10.5555/2969442.2969519). In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press.
57+
58+
2. Zhao Lucis Li, Chieh-Jan Mike Liang, Wei Bai, Qiming Zheng, Yongqiang Xiong, and Guangzhong Sun. 2019. [*Accelerating Rule-matching Systems with Learned Rankers*](https://www.usenix.org/conference/atc19/presentation/li-zhao). In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (ATC '19). USENIX Association.
59+
60+
3. Chieh-Jan Mike Liang, Hui Xue, Mao Yang, Lidong Zhou, Lifei Zhu, Zhao Lucis Li, Zibo Wang, Qi Chen, Quanlu Zhang, Chuanjie Liu, and Wenjun Dai. 2020. [*AutoSys: The Design and Operation of Learning-Augmented Systems*](https://dl.acm.org/doi/abs/10.5555/3489146.3489168). In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC '20). USENIX Association.

Textbook/第4章-矩阵运算与计算机体系结构/4.1-深度学习的计算模式.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,17 @@ $.
7070
请读者思考,除了矩阵乘法之外,还有哪些你认为深度学习模型中常用到的计算模式或算子呢? 这些算子在不同的硬件平台上是否有较好的软件库支持呢?
7171

7272
## 参考文献
73-
- https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html
74-
- https://developer.nvidia.com/cublas
75-
- https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution
76-
- https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html
77-
- https://en.wikipedia.org/wiki/Recurrent_neural_network
78-
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
79-
- [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
73+
74+
1. https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html
75+
76+
2. https://developer.nvidia.com/cublas
77+
78+
3. https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution
79+
80+
4. https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html
81+
82+
5. https://en.wikipedia.org/wiki/Recurrent_neural_network
83+
84+
6. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
85+
86+
7. [Attention Is All You Need](https://arxiv.org/abs/1706.03762)

Textbook/第4章-矩阵运算与计算机体系结构/4.2-计算机体系结构与矩阵运算.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,15 @@ for (int i = 0; i < M; i++) {
124124
思考:请列举一些你能想到的其它CPU体系结构特点,以及思考这些特点对矩阵乘法甚至其它运算带来的好处和影响。
125125

126126
## 参考文献
127-
- https://en.wikipedia.org/wiki/Computer_architecture
128-
- https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
129-
- https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
130-
- https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html
131-
- https://developer.nvidia.com/cublas
132-
- https://pytorch.org/docs/master/notes/extending.html
127+
128+
1. https://en.wikipedia.org/wiki/Computer_architecture
129+
130+
2. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
131+
132+
3. https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
133+
134+
4. https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html
135+
136+
5. https://developer.nvidia.com/cublas
137+
138+
6. https://pytorch.org/docs/master/notes/extending.html

Textbook/第4章-矩阵运算与计算机体系结构/4.3-GPU体系结构与矩阵运算.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,10 @@ cd .. && python mnist_custom_linear_cuda.py
109109

110110
## 参考文献
111111

112-
- https://en.wikipedia.org/wiki/Graphics_processing_unit
113-
- CUDA Programming model: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
114-
- An Even Easier Introduction to CUDA: https://devblogs.nvidia.com/even-easier-introduction-cuda/
115-
- CUSTOM C++ AND CUDA EXTENSIONS: https://pytorch.org/tutorials/advanced/cpp_extension.html
112+
1. https://en.wikipedia.org/wiki/Graphics_processing_unit
113+
114+
2. CUDA Programming model: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
115+
116+
3. An Even Easier Introduction to CUDA: https://devblogs.nvidia.com/even-easier-introduction-cuda/
117+
118+
4. CUSTOM C++ AND CUDA EXTENSIONS: https://pytorch.org/tutorials/advanced/cpp_extension.html

Textbook/第5章-深度学习框架的编译与优化/5.1-深度神经网络编译器.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,18 @@ C = t.compute((m, n),
105105

106106
## 参考文献
107107

108-
- https://gcc.gnu.org/
109-
- The LLVM Compiler Infrastructure: https://llvm.org/
110-
- LLVM IR and Go: https://blog.gopheracademy.com/advent-2018/llvm-ir-and-go/
111-
- TVM. https://tvm.apache.org/
112-
- [Ansor : Generating High-Performance Tensor Programs for Deep Learning](https://arxiv.org/abs/2006.06762)
113-
- https://github.com/facebookresearch/TensorComprehensions
114-
- https://en.wikipedia.org/wiki/Graphics_processing_unit
115-
- https://cloud.google.com/tpu
108+
1. https://gcc.gnu.org/
109+
110+
2. The LLVM Compiler Infrastructure: https://llvm.org/
111+
112+
3. LLVM IR and Go: https://blog.gopheracademy.com/advent-2018/llvm-ir-and-go/
113+
114+
4. TVM. https://tvm.apache.org/
115+
116+
5. [Ansor : Generating High-Performance Tensor Programs for Deep Learning](https://arxiv.org/abs/2006.06762)
117+
118+
6. https://github.com/facebookresearch/TensorComprehensions
119+
120+
7. https://en.wikipedia.org/wiki/Graphics_processing_unit
121+
122+
8. https://cloud.google.com/tpu

Textbook/第5章-深度学习框架的编译与优化/5.2-计算图优化.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,12 @@
108108

109109
## 参考文献
110110

111-
- https://en.wikipedia.org/wiki/Optimizing_compiler
112-
- https://en.wikipedia.org/wiki/Common_subexpression_elimination
113-
- https://en.wikipedia.org/wiki/Constant_folding
114-
- TensorFlow Graph Optimizations:https://www.tensorflow.org/guide/graph_optimization
115-
- Graph Optimizations in ONNX Runtimes: https://onnxruntime.ai/docs/performance/graph-optimizations.html
111+
1. https://en.wikipedia.org/wiki/Optimizing_compiler
112+
113+
2. https://en.wikipedia.org/wiki/Common_subexpression_elimination
114+
115+
3. https://en.wikipedia.org/wiki/Constant_folding
116+
117+
4. TensorFlow Graph Optimizations:https://www.tensorflow.org/guide/graph_optimization
118+
119+
5. Graph Optimizations in ONNX Runtimes: https://onnxruntime.ai/docs/performance/graph-optimizations.html

0 commit comments

Comments
 (0)