Last November, we achieved the world record speed of training ImageNet in just 224 seconds.
After just 4 months, we have set up yet another milestone, where we were able to train ImageNet in only 122 seconds, almost twice as fast as our previous record.
Our main technical contributions this time are as following:
* we addressed the instability of the large mini-batch training with batch-size control and label smoothing
* we addressed the overhead of the gradient synchronization with 2D-Torus all-reduce
We will continue to make training of neural networks faster and faster!