Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG

Zeng, Yan; Wang, Wei; Ding, Yong; Zhang, Jilin; Ren, Yongjian; Yi, Guangzheng

Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG

Yan Zeng, Wei Wang, Yong Ding, Jilin Zhang (), Yongjian Ren () and Guangzheng Yi
Additional contact information
Yan Zeng: School of Computing Science, Hangzhou Danzi University, Hangzhou 310018, China
Wei Wang: School of Computing Science, Hangzhou Danzi University, Hangzhou 310018, China
Yong Ding: School of Computing Science, Hangzhou Danzi University, Hangzhou 310018, China
Jilin Zhang: School of Computing Science, Hangzhou Danzi University, Hangzhou 310018, China
Yongjian Ren: School of Computing Science, Hangzhou Danzi University, Hangzhou 310018, China
Guangzheng Yi: School of Computing Science, Hangzhou Danzi University, Hangzhou 310018, China

Mathematics, 2022, vol. 10, issue 24, 1-21

Abstract: AI provides a new method for massive simulated data calculations in molecular dynamics, materials, and other scientific computing fields. However, the complex structures and large-scale parameters of neural network models make them difficult to develop and train. The automatic parallel technology based on graph algorithms is one of the most promising methods to solve this problem, despite the low efficiency in the design, implementation, and execution of distributed parallel policies for large-scale neural network models. In this paper, we propose an adaptive distributed parallel training method based on the dynamic generation of critical DAG (directed acyclic graph) paths, called FD-DPS, to solve this efficiency problem. Firstly, the proposed model splits operators with the dimension of the tensor, which can expand the space available for model parallelism. Secondly, a dynamic critical path generation method is employed to determine node priority changes in the DAG of the neural network models. Finally, the model implements the optimal scheduling of critical paths based on the priority of the nodes, thereby improving the performance of parallel strategies. Our experiments show that FD-DPS can achieve 12.76% and 11.78% faster training on PnasNet_mobile and ResNet_200 models, respectively, compared with the MP-DPS and Fast methods.

Keywords: deep learning; model parallel; auto-parallel; dynamic critical path; DAG (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/24/4788/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/24/4788/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:24:p:4788-:d:1005332

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().