Title: Training Large Neural Networks with Small Network Footprint
Time：2018.10.25 Thursday 16：20
Place：Room 106，Software Building
Distributed machine learning (ML) systems using parameter servers are prevalently used in industry. With the rapid development of GPU, training performance is usually bottlenecked at network communication for exchanging gradients and parameters.
In this talk, I will share our work on how to alleviate the communication bottleneck and speed up distributed ML training. First I will motivate the problem with measurements on GPU clusters in Azure and EC2. Then I will share the design and implementation of our solution, a system called Stanza that separates the training of different layers in ML models, by exploiting their distinct characteristics. A prototype of Stanza is implemented on PyTorch. Our evaluation on Azure and EC2 shows that Stanza provides 1.25x to 13x speedups over parameter server, for training common CNNs on ImageNet with Nvidia V100 GPUs and 10GbE network.
Hong Xu is an assistant professor in Department of Computer Science, City University of Hong Kong. His research area is computer networking, particularly data center networks and big data systems. He received the B.Eng. degree from The Chinese University of Hong Kong in 2007, and the M.A.Sc. and Ph.D. degrees from University of Toronto in 2009 and 2013, respectively. He was the recipient of an Early Career Scheme Grant from the Hong Kong Research Grants Council in 2014. He received several best paper awards (ACM TURC 2017 (Sigcomm China), IEEE ICNP 2015, etc.). He is a member of ACM and IEEE.