A study of pipeline parallelism in deep neural networks

Gabriel Núñez; Hairol Romero-Sandí; Elvis Rojas; Esteban Meneses

doi:10.29375/25392115.5056

Gabriel Núñez Universidad Nacional
Hairol Romero-Sandí Universidad Nacional
Elvis Rojas Universidad Nacional | National High Technology Center
Esteban Meneses Costa Rica Institute of Technology | National High Technology Center

DOI: https://doi.org/10.29375/25392115.5056

Palabras clave: Deep learning, parallelism, artificial neural networks, distributed training

Resumen Referencias bibliográficas Cómo citar Descargas

Resumen

The current popularity in the application of artificial intelligence to solve complex problems is growing. The appearance of chats based on artificial intelligence or natural language processing has generated the creation of increasingly large and sophisticated neural network models, which are the basis of current developments in artificial intelligence. These neural networks can be composed of billions of parameters and their training is not feasible without the application of approaches based on parallelism. This paper focuses on studying pipeline parallelism, which is one of the most important types of parallelism used to train neural network models in deep learning. In this study we offer a look at the most important concepts related to the topic and we present a detailed analysis of 3 pipeline parallelism libraries: Torchgpipe, FairScale, and DeepSpeed. We analyze important aspects of these libraries such as their implementation and features. In addition, we evaluated them experimentally, carrying out parallel trainings and taking into account aspects such as the number of stages in the training pipeline and the type of balance.

Referencias bibliográficas

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. e Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16). November 2–4 (pp. 264-283). Savannah, GA, USA: USENIX Association. https://doi.org/10.48550/arXiv.1605.08695

Akintoye, S., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. https://doi.org/10.1109/ACCESS.2022.3193690

Alshamrani, R., & Ma, X. (2022). Deep Learning. In C. L. McNeely, & L. A. Schintler (Eds.), Encyclopedia of Big Data (pp. 373-377). Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-32010-6_5

Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). Dallas, TX, USA: IEEE. https://doi.org/10.1109/SC41404.2022.00051

Chatelain, A., Djeghri, A., Hesslow, D., & Launay, J. (2022). Is the Number of Trainable Parameters All That Actually Matters? In M. F. Pradier, A. Schein, S. Hyland, F. J. Ruiz, & J. Z. Forde (Ed.), Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops. 163, pp. 27-32. PMLR. https://proceedings.mlr.press/v163/chatelain22a.html

Chen, M. (2023). Analysis of Data Parallelism Methods with Deep Neural Network. EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, October 21 - 23 (pp. 1857 - 1861). Xiamen, China: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3573428.3573755

Chen, Z., Xu, C., Qian, W., & Zhou, A. (2023). Elastic Averaging for Efficient Pipelined DNN Training. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP´23 (pp. 380-391). Montreal, QC, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3572848.3577484

Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project Adam: Building an Efficient and Scalable Deep Learning Training System. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI´14). October 6–8 (pp. 570-582). Broomfield, CO: USENIX Association. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf

Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q. V., . . . Ng, A. Y. (2012). Large Scale Distributed Deep Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Ed.), Advances in Neural Information Processing Systems (NIPS 2012). 25, pp. 1223-1231. Curran Associates. https://proceedings.neurips.cc/paper_files/paper/2012/file/6aca97005c68f1206823815f66102863-Paper.pdf

Deep Learning. (2020). In A. Tatnall (Ed.), Encyclopedia of Education and Information Technologies (First ed., p. 558). Springer Cham. https://doi.org/10.1007/978-3-030-10576-1_300164

Deeplearning4j: Deeplearning4j Suite Overview. (2023, July). https://www.deepspeed.ai/

DeepSpeed authors: Deepspeed (overview and features). (2023, July). (Microsoft) https://www.deepspeed.ai/

FairScale authors. (2021). Fairscale: A general purpose modular pytorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale

Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., . . . Lin, W. (2021). DAPPLE: a pipelined data parallel approach for training large models. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 431-445). Virtual Event, Republic of Korea: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3437801.3441593

Farkas, A., Kertész, G., & Lovas, R. (2020). Parallel and Distributed Training of Deep Neural Networks: A brief overview. 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) (pp. 165-170). Reykjavík, Iceland: IEEE. https://doi.org/10.1109/INES49302.2020.9147123

Guan, L., Yin, W., Li, D., & Lu, X. (2020, November 9). XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training. arXiv:1911.04610v3 [cs.LG]. https://doi.org/10.48550/arXiv.1911.04610

Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N., Ganger, G., & Gibbons, P. (2018, June 18). PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv:1806.03377v1 [cs.DC]. https://doi.org/10.48550/arXiv.1806.03377

Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, M. X., Chen, D., . . . Chen, Z. (2019, July 25). GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv:1811.06965v5 [cs.CV], 1-11. https://doi.org/10.48550/arXiv.1811.06965

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., . . . Darrell, T. (2014, June 20). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv:1408.5093v1 [cs.CV], 1-4.

Keras: Keras api references. (2023, July). https://keras.io/api/

Kim, C., Lee, H., Jeong, M., Baek, W., Yoon, B., Kim, I., . . . Kim, S. (2020, April 21). torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models. arXiv:2004.09910v1 [cs.DC], 1-10. https://doi.org/10.48550/arXiv.2004.09910

Krizhevsky, A. (2014, April 26). One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997v2 [cs.NE], 1-7. https://doi.org/10.48550/arXiv.1404.5997

Li, S., & Hoefler, T. (2021). Chimera: efficiently training large-scale neural networks with bidirectional pipelines. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article No. 27, pp. 1-14. St. Louis, Missouri, USA: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3458817.3476145

Liang, G., & Alsmadi, I. (2022, February 12). Benchmark Assessment for DeepSpeed Optimization Library. arXiv:2202.12831v1 [cs.LG], 1-8. https://doi.org/10.48550/arXiv.2202.12831

Liu, W., Lai, Z., Li, S., Duan, Y., Ge, K., & Li, D. (2022). AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing. 2022 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 301-312). Heidelberg, Germany: IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00042

Luo, Z., Yi, X., Long, G., Fan, S., Wu, C., Yang, J., & Lin, W. (2022). Efficient Pipeline Planning for Expedited Distributed DNN Training. IEEE INFOCOM 2022 - IEEE Conference on Computer Communications (pp. 340-349). IEEE. https://doi.org/INFOCOM48880.2022.9796787

Mofrad, M. H., Melhem, R., Ahmad, Y., & Hammoud, M. (2020). Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms. 2020 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-7). Waltham, MA, USA: IEEE. https://doi.org/10.1109/HPEC43674.2020.9286195

MXNet: Mxnet api docs. (2023, July). https://mxnet.apache.org/versions/1.9.1

Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Gang, G. R., . . . Zaharia, M. (2019). PipeDream: generalized pipeline parallelism for DNN training. (pp. 1-15). Huntsville, Ontario, Canada: Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341301.3359646

Padua, D. (2011). Pipelining. In D. Padua (Ed.), Encyclopedia of Parallel Computing (pp. 1562–1563). Boston, MA, USA: Springer. https://doi.org/10.1007/978-0-387-09766-4_335

Park, J. H., Yun, G., Yi, C. M., Nguyen, N. T., Lee, S., Choi, J., . . . Choi, Y.-r. (2020). HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. 2020 USENIX Annual Technical Conference (USENIX ATC 20) (pp. 307-321). USENIX Association. https://www.usenix.org/conference/atc20/presentation/park

PlaidML: Plaidml api docs. (2023, July). https://github.com/plaidml/plaidml

Pytorch: Pytorch documentation. (2023, July). https://pytorch.org/

Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2020, May 13). ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054v3 [cs.LG], 1-24. https://doi.org/10.48550/arXiv.1910.02054

Rasley, J., Rajbhandari, S., Ruwase, O., & He, Y. (2020). DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event. July 6 - 10. CA, USA: Association for Computing Machinery. https://doi.org/10.1145/3394486.3406703

Rojas, E., Pérez, D., Calhoun, J. C., Bautista Gomez, L., Jones, T., & Meneses, E. (2021). Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 2021 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 492-503). Portland, OR, USA: IEEE. https://doi.org/10.1109/Cluster48925.2021.00045

Rojas, E., Quirós-Corella, F., Jones, T., & Meneses, E. (2022). Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch. In I. Gitler, C. Barrios Hernández, & E. Meneses (Ed.), High Performance Computing. CARLA 2021. Communications in Computer and Information Science. 8th Latin American Conference, CARLA 2021, October 6–8, 2021, Revised Selected Papers. 1540, pp. 177-192. Guadalajara, Mexico: Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_13

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., . . . Fei-Fei, L. (2015, January 30). ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575v3 [cs.CV]. https://doi.org/10.48550/arXiv.1409.0575

Takisawa, N., Yazaki, S., & Ishihata, H. (2020). Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism. 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 130-136). Naha, Japan: IEEE. https://doi.org/10.1109/CANDARW51189.2020.00036

TensorFlow: Overview. (2023, July). https://www.tensorflow.org/

Yang, P., Zhang, X., Zhang, W., Yang, M., & Wei, H. (2022). Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training. International Conference on Learning Representations. https://openreview.net/forum?id=cw-EmNq5zfD

Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2016). Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency. IEEE Transactions on Cloud Computing, 4(1), 63 - 75. https://doi.org/10.1109/TCC.2015.2415804

Zeng, Z., Liu, C., Tang, Z., Chang, W., & Li, K. (2021). Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy. 2021 58th ACM/IEEE Design Automation Conference (DAC) (pp. 1165-1170). Francisco, CA, USA: IEEE. https://doi.org/10.1109/DAC18074.2021.9586300

Zhang, P., Lee, B., & Qiao, Y. (2023, October). Experimental evaluation of the performance of Gpipe parallelism. Future Generation Computer Systems, 147, 107-118. https://doi.org/10.1016/j.future.2023.04.033

Cómo citar

Núñez, G., Romero-Sandí, H., Rojas, E., & Meneses, E. (2024). A study of pipeline parallelism in deep neural networks. Revista Colombiana De Computación, 25(1), 48–59. Recuperado a partir de https://revistas.unab.edu.co/index.php/rcc/article/view/5056

Descargar cita

Descargas

Los datos de descargas todavía no están disponibles.

A study of pipeline parallelism in deep neural networks

Resumen

Referencias bibliográficas

Descargas

Métricas

button_group_sidebar

tutoriales

Para autores:

Para editores:

Para revisores:

Scimago

estadisticas

sugeridos

creative_commons

Importante

Nuestros Sitios

Enlaces de Interés