Distributed Training is supported

Friday, August 04, 2017


Posted by Kazuki Yoshiyama

With version 0.9.3: nnabla and nnabla-ext-cuda, NNabla now supports distributed training (i.e. multi-GPU). It supports the following methods:

  • Multi-process using mpirun/mpiexec
  • Multi-threading

Using mpirun, NNabla can run distributed training with almost the same training script at your hand,

$ mpirun -n 4 python ${your_training_script.py}

See the tutorial and the Cifar-10 example for more details.

To enable distributed training, it is required to install NNabla from source. Please see the installation instructions.