We would like to introduce Sony’s achievements at very large-scale distributed GPU training that were just announced today.
While you can check press release and technical support for details, the main topics are as following:
• Construction of very large-scale distributed GPU training environment built with Neural Network Libraries and AIST’s ABCI.
• Training ImageNet with ResNet-50 in the world record of 224 seconds, using 2176 GPUs at most
• Technical details are available here
Neural Network Libraries’ GPU distributed training functionality is provided in binary format installable with pip command. Check documentation for further details (we used a version branched from v1.0.0 and customized to ABCI in this experiment).
Also, check out the cloud version of Neural Network Console, our GUI deep learning development tool, which enables a distributed training of 8 GPUs with Neural Network Libraries, without having to set up any infrastructure.