3rd Anniversary for Neural Network Libraries!

Tuesday, July 14, 2020

Posted by shin

Looking Back at the Year…

The 25th of June marked the 3rd anniversary for Neural Network Libraries!

Celebrating our 3rd anniversary, we have released NNabla NAS, our new neural architecture search framefork! Check the corresponding blog post for more details!

Deep learning has continued to expand and evolve at an unprecedented speed in the past year, only to reaffirm the immense amount of demand for highly reliable, up-to-date framework that can be deployed for both research and development. Accordingly, it was impossible to stay content with what we already had at any given moment, and continuing from the important milestones in our 2nd year, such as setting the world record for ResNet training on ImageNet, our 3rd year with Neural Network Libraries has witnessed a massive amount of changes, whose process has often been challenging.

During the past year, we have put a special emphasis on strengthening our support for content creation. A wide array of generative models have been added, ranging from Self-attention GAN to InstaGAN, StarGAN, or SPADE, just to name a few. We have also started adding interactive demos, where creators without background in programming or machine learning can experience state-of-the-art models.

We have also been actively engaged in leading research efforts in a variety of fields, such as speech enhancement or neural network quantization.

We genuinely appreciate all the interest, feedback, advices, criticism, contribution for Neural Network Libraries, and promise to keep doing our best to support our users.

Important Updates over the Year

For the rest of this post, we would like to look back at the year, and review some of the important features that have been added over the year.

First of all, we have finally made Japanese documentation available! With this release, we hope to further expand our user base and offer more convenience for non-English-speaking users.

We have continued to implement important functional layers, ranging from double backward to transformer and weight normalization. We have also sought to improve the usability with tools such as NanInfTracer or DALI Iterator. As mentioned in the foreword, we have implemented many generative models, along with other important models such as Mixup or MAML. Finally, our research efforts are available with both papers and codes.

Here are more detailed list of some of the important updates over the past year:

Layers

Double Backward

Double backward, i.e., the second-order gradients of outputs with respect to inputs, is critical for implementing many state-of-the-art deep learning techniques.

We have enabled double backward for more than 70 function layers, highly enriching the applicability of Neural Network Libraries.

grads = nn.grad(outputs, inputs)

# Manipulate {grads} as usual variables.

MultiHead Attention and Transformer

Multi-head attention layer provides a building block component for transformer, which has become a state-of-the-art model for many tasks such as language modeling. It is also being actively applied to a variety of computer vision tasks.

Transformer (Vaswani et al., NIPS 2017)
Transformer is a highly popular model based on multi-head attention mechanism, and has been shown to be powerful in a variety of machine learning tasks, particularly in natural language processing and speech recognition. It also provides a foundation upon which many of the recent state-of-the-art models, such as BERT, are built. Tutorial on how to use transformers with Neural Network Libraries is coming soon, so stay tuned!

Weight Normalization

Weight normalization (Salimans and Kingma, NIPS 2016)
Weight normalization is a reparametrization technique for the weight parameters in neural networks that speeds up the optimization.

CPU implementation of RNN, GRU, and LSTM

RNN, GRU, and LSTM that were available for GPU only can now be used with CPUs too, making its usage feasible for lightweight, low-powered devices with limited resources!

Adaptive Separable Convolution

This layer implements 2D Adaptive Separable Convolution for NCHW (the channel-first tensor), which is useful for video frame interpolation as demonstrated in this paper. Sample and pixel-dependent vertical/horizontal kernels are dynamically generated, and are used for approximating feature-independent 2D kernel. We support gradients with respect to all of inputs, images, vertical kernels, and horizontal kernels.

Utilities

NanInfTracer

We have added NanInferTracer that tracks where in the network is responsible for Nan or Inf. It can be used simply by adding a few lines as below:

pred = model(...)

from nnabla.utils.inspection import NanInfTracer
nit = NanInfTracer(trace_inf=True, trace_nan=True, need_details=True)

with nit.trace():
    ...
    pred.forward(function_post_hook=nit.forward_post_hook)
    pred.backward(function_post_hook=nit.backward_post_hook)
    ...

When Nan or Inf occur, the following message is printed.

Error during forward propagation
        Convolution
        Convolution
        Constant
        Reshape
        Div2 <-- ERROR

ValueError: The 0th output of the function 'Div2' (rank: 3) has nan or inf as its values.
        Function details:
            function type: Div2
            shapes of inputs: ['(2, 4, 10, 10)', '(1, 1, 1, 1)']
            shapes of outputs: ['(2, 4, 10, 10)']
            function args: {}

With NanInfTracer, debugging which layer leads to unanticipated computation is easier than ever!

DALI Iterator

DALI stands for NVIDIA Data Loading Library, and enables accelerated pre-processing of input data. Note that in this version, it works with DALI <= 0.14. Please install DALI with the version specified as following.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 \
    nvidia-dali==0.14

Documentation

Add Japanese Documentation

Japanese version of documentation for Neural Network Libraries is now available! While there are parts that have not been translated yet, we will proceed to make it complete with future releases.

Models

MobileNet V1, V2, V3 for ImageNet example

ImageNetのExampleでMobileNetV1, V2, V3 are now available to use for ImageNet example! You can also download each model’s pre-trained weights.

EfficientNet B0, B1, B2, B3

EfficientNet proposed by the researchers at Google has been added to NNNabla example! This model implements a compact yet high-performing efficient network by a new method that scales the resolution and the number of feature maps and layers.
Architectural variations exist for EfficientNet depending on the number of parameters of FLOPs, and we have added the versions B0, B1, B2, B3. We have also released the parameters of these models trained on ImageNet, so make sure to check it out!

Add DeepLabV3+ model as a pretrained model API

DeepLabV3+ is a state-of-the-art model for semantic segmentation, and is now readily available with pre-trained model API!

Examples

NNabla Tutorials on Colab

Tutorials can now be run directly on colab by clicking “open in colab” badge!
On top of our previous tutorials that show you the basic usage of NNabla, we have also added tutorials on CIFAR-10 image classification and image generation with DCGAN. Strongly recommended for NNabla beginners!

InstaGAN / StarGAN / ESR-GAN

InstaGAN performs image-to-image translation, especially of target instances, e.g. pants to skirts, by incorporating object segmentation masks. StarGAN also performs image-to-image translation, enabling translations for multiple domains with single model.

See below the example of input images and generated translation images using each model!

StarGAN		InstaGAN

Input：black hair/female	Output：blond/female

Input：black hair/male	Output：blond/female	input image (jeans)	output image (skirt)

ESRGAN (Wang et al, ECCV 2018 Workshop) performs an enhanced single image super-resolution. Inference can be performed with Neural Network Libraries by converting the pre-trained weights available in pytorch, which also can be done using our code.

Deep Q-Learning

DQN is one of the most widely used deep reinforcement learning (RL) algorithms. Check this blog post for more details!

Pix2PixHD / SPADE

Pix2PixHD (Wang et al., CVPR 2018) performs high-resolution image-to-image translation. For example, you can convert semantic segmentation maps to photorealistic images as in example below:

Input	Output

We have implemented SPADE, which allows users to synthesize images with semantic maps and styles! Rough drawings can now be converted to photorealistic street views or beautiful scenery!

Input	Output

MAML

MAML (Model-Agnostic Meta-Learning) (Finn et al., ICML 2017) is a seminal work in meta-learning, which seeks to train a model that learns how to learn, thus quickly adapting to new tasks with limited samples.