With version 0.9.3: nnabla and nnabla-ext-cuda, NNabla now supports distributed training (i.e. multi-GPU). It supports the following methods:
- Multi-process using mpirun/mpiexec
- Multi-threading
Using mpirun, NNabla can run distributed training with almost the same training script at your hand,
$ mpirun -n 4 python ${your_training_script.py}
See the tutorial and the Cifar-10 example for more details.
To enable distributed training, it is required to install NNabla from source. Please see the installation instructions.