We have released Neural Network Libraries v1.24.0! Improved DDPM and NeRF models have been added, as well as speedups for several CUDA implementations. Also, we now support Python 3.9.
Spotlight
Improved DDPM
We have implemented nnabla version of Improved Denoising Diffusion Probabilistic Models (ICML 2021)! As a sampling method during generation, Denoising Diffusion Implicit Models (ICLR 2021) is also included.
Diffusion models are a branch of generative models that have recently gained attention, as they can generate high-quality images that are comparable, if not better, to images generated by GANs, while allowing for stable training. For further details, please refer to respective papers or our implementation in nnabla-examples repository。
NeRF / NeRF-W
We have implemented Neural Radiance Fields (NeRF), a deep learning model for novel view synthesis based on volumetric rendering! Two variants of NeRF have been implemented:
- Original NeRF: This has verified across 20 scenes from LLFF (realistic, forward facing), DeepVoxel and Blender (synthetic) datasets and the test performance has been benchmarked against the original implementation
- NeRF in the Wild (NeRF-W): This NeRF variant allows novel view synthesis of a scene from a set of unconstraint set of photos of that scene. It has been verified across synthetic lego scene with artificially added transient occluders and appearance variation and multiple scenes from the phototourism dataset (Sacre Coeur, Brandenburg Gate, Taj Mahal, Hagia Sophia, Notre-Dame).
Optimization of Instance Normalization (CPU / GPU)
We have implemented a kernel that broadcasts scale and bias at the same time, making CUDA implementation of Instance Normalization faster. Both forward and back propagation operations are now significantly faster compared to previous nnabla implementation in all cases except when the batch size is 1. (The speedup rate depends on the input shape, but it ranges from several tens to 100 times faster.)
Optimization of CumProd / CumSum (CPU / GPU)
We have implemented memory optimization and speedup for CUDA implementation of CumProd
and CumSum
. For arbitrary input shape, both forward and back propagation operations are now significantly faster compared to the previous nnabla implementation. (The speedup rate depends on the input shape, but ranges from several tens to a thousand times faster)
Enhanced recomputation API
We have enhanced our recomputation API, which discards the results from forward computation and re-computes it during backward computation for reduction of memory usage during training. You can easily set a range for recomputation using Python’s with
as shown below:
x = nn.Variable(...)
with nn.recompute():
h = net1(x) # all intermediate variables will be set as recompute=True
y = net2(h)
y.forward() # variables in net1 will be cleared from memory
y.backward() # variables in net1 will be recomputed when required.
StyleGAN2 Training
Following the release of StyleGAN2-CDC and StyleGAN2-EWC in v1.23, we now present StyleGAN2 training implementation in nnabla. The implementation has been verified on FFHQ dataset and supports additional inference operations such as latent space interpolation, latent space projection and perceptual path length calculation.
Support for Python3.9 (CPU / GPU)
We have added support for python3.9. Along with this update, tensorflow used by file format converter has also been updated to v2.5.1.
Build
- Add PYTEST_OPTS for pytest parallel execution (CPU / GPU / C-Runtime)
- support python3.9 (CPU / GPU)
- Sync api level version from nnabla (1 / 2)
Format Converter
Layers
- Support broadcast in instance norm kernel to improve performance (CPU / GPU)
- Make ISTFT consistent with PyTorch implementation (NOLA condition) (CPU / GPU)
- Add linspace function (CPU / GPU)
- Optimize CumProd/CumSum (CPU / GPU)
Utilities
Examples
- Stylegan2 training script
- Add improved DDPM example
- ImageNet classification finetuning sample
- SLE-GAN: Modified loss function with lpips and added scripts for interpolation and style mixing
Bugfix
- do not remove identity layer if it is not created by expend control
- Remove the argument “output_mask” from Dropout (CPU / GPU)
- fix multithread potential issue (CPU / GPU)
- Fix recomputation for the function which requires output data for backward computation