JA

Running Neural Network Libraries’ Example of ImageNet Dataset ~High-speed performance by distributed training with multi-GPUs~

Friday, October 19, 2018

News

Posted by gomi

The example script of ImageNet learning has been updated along with the release of Neural Network Libraries version 1.0. Now, the example script of ImageNet not only runs on single GPU, but can also achieve high-speed performance by distributed training with multi-GPUs.
Anyone can run training by simply executing the scripts explained below.
With distributed training, if you have 4 GPUs, for example, training time will be faster nearly by a factor of 4.

In this post, we will go over the steps to run the example scrpit of ImageNet learning.
One can also refer to README in the URL below, which contains the same instructions.
README of ImageNet Example

1. Obtaining dataset

Please download the three files of ImageNet dataset and developer tools from the page below:
・ImageNet download page
ImageNet Dataset
・ILSVRC2012_img_train.tar(Training images)
・ILSVRC2012_img_val.tar(Validation images)
Since there are conditions for using the data, please also refer to the following and download it:
ImageNet main site
This example script runs on 2012 version (containing 1000 classes) of the dataset.

2.Creating cache files

Cache files are created in order to reduce the disc I/O overhead when importing training and validation data. You can create a cache file once before starting learning, and you can use the already created cache file for the second and subsequent learning.
Depending on the environment, it can take roughly from half a day up to a day. Upon running the example scripts, these cache files are used to load the data. For details of the script, refer to the README below:

usage: create_cache.py [-h]
                       [-W WIDTH] [-H HEIGHT]
                       [-m {trimming,padding}]
                       [-S {True,False}]
                       [-N FILE_CACHE_SIZE]
                       [-C {h5,npy}]
                       [--thinning THINNING]
                       input [input ...] output

positional arguments:
  input                 Source file or directory.
  output                Destination directory.

optional arguments:
  -h, --help            show this help message and exit
  -W WIDTH, --width WIDTH
                        width of output image (default:320)
  -H HEIGHT, --height HEIGHT
                        height of output image (default:320)
  -m {trimming,padding}, --mode {trimming,padding}
                        shaping mode (trimming or padding) (default:trimming)
  -S {True,False}, --shuffle {True,False}
                        shuffle mode if not specified, train:True, val:False. Otherwise specified value will be used for both.
  -N FILE_CACHE_SIZE, --file-cache-size FILE_CACHE_SIZE
                        num of data in cache file (default:100)
  -C {h5,npy}, --cache-type {h5,npy}
                        cache format (h5 or npy) (default:npy)
  --thinning THINNING Thinning rate

This script creates a cache directory from the .tar archive of ImageNet dataset. The contents of the .tar archive are automatically recognized to create the cache. This script sets the image size as 320×320 when creating cache files. Shuffle mode is enabled for training data, but disabled for validation data.

python create_cache.py \
    ImageNet/ILSVRC2012_img_train.tar \
    ImageNet/ILSVRC2012_img_val.tar \
    ImageNet/imagenet-320-320-trimming-npy

After running the script, following files are created.
・ImageNet/imagenet-train-320-320-trimming-npy/train:Cache for training dataset. Shuffled.
・ImageNet/imagenet-train-320-320-trimming-npy/val:Cache for validation dataset. Not shuffled.

3.Running with single GPU

Following is the script to run the training. This example employs ResNet-34 with batch size of 64 and gradients accumulated up to 4; thus, gradients are updated every 64×4=256 batches.

python classification.py [-c device id] [-b batch size] [-a accumulate gradient] [-L number of layers] [-T directory of the trainning cache file] [-V "directory of the validation cache file]

ex):

python classification.py -c cudnn -b64 -a4 -L34 -T ImageNet/imagenet-320-320-trimming-npy/train -V ImageNet/imagenet-320-320-trimming-npy/val

If you would like to perform validation and save parameters for every epoch, following settings are necessary:

python classification.py [-c device id] [-b batch size] [-a accumulate gradient] [-L number of layers] [-v validation interval] [-j mini-batch iteration of validation] [-s interval of saving model parameters] [-T directory of the trainning cache file] [-V directory of the validation cache file]

ex):

python classification.py -c cudnn -b64 -a4 -L34 -v 5000 -j 782 -s 5000 -T ImageNet/imagenet-320-320-trimming-npy/train -V ImageNet/imagenet-320-320-trimming-npy/val

4. Running with Multi GPU

The installation of multi GPU environment for Neural Network Libraries is possible with pip install.For details of the installation, refer to the following URL:
README of Multi GPU environment

Following is the script to run the training. This example employs ResNet-50 with batch size of 32 and gradients accumulated up to 2; thus, gradients are updated every 32×2×4(number of GPUs)=256 batches.

mpirun [-n Number of GPUs] multi_device_multi_process_classification.py [-b batch size] [-a accumulate gradient] [-L number of layers] [-l learning rate] [-i max iteration of training] [-v validation interval] [-j mini-batch iteration of validation] [-s interval of saving model parameters] [-D interval of learning rate decay] [-T directory of the trainning cache file] [-V directory of the validation cache file]

ex):

mpirun -n 4 python multi_device_multi_process_classification.py -b 32 -a 2 -L 50 -l 0.1 -i 2000000 -v 20004 -j 1563 -s 20004 -D 600000,1200000,1800000 -T ImageNet/imagenet-320-320-trimming-npy/train -V ImageNet/imagenet-320-320-trimming-npy/val

5.Checking the results

Training and validation results, along with the learned parameters are saved under tmp.montors.imagenet directory.
・Training-error.series.txt:Top-1 Error for each iteration during training.
・Training-loss.series.txt:Loss for each iteration during training.
・Training-time.timer.txt:Time elapsed for each iteration during training.
・Validation-error.series.txt:Top-1 Error for each iteration during validation.
・Validation-loss.series.txt:Loss for each iteration during validation.
・Validation-time.timer.txt:Time elapsed for each iteration during validation.
・param_xxxxxx.h5:trained set of parameters at specified iteration.
You can look up and review various aspects of the training/validation depending on your needs.

When we ran the example scripts ourselves, we ended up obtaining top-1 validation error 27.5% for 3.(ResNet‐34, Single GPU), and 26.1% for 4.(ResNet-50, Multi GPU). With regards to the gain in speed with distributed training, using 4 GPUs (AWS/p3.8xlarge instance) took 1.5 days, which is about 1/4 of single GPU(ResNet-50).

We invite you to run and enjoy our example of ImageNet learning.