We have released Neural Network Libraries v1.22.0! We have implemented HiFi-GAN for speech synthesis, as well as inference code for deep exemplar-based video colorization, and a demo for quantized tflite converter!
Spotlight
HiFi-GAN
We have implemented HiFi-GAN (Kong et al., NeurIPS 2020) for high-fidelity speech synthesis! HiFi-GAN employs generative adversarial networks to produce raw waveforms, demonstrating that modeling periodic patterns of an audio is crucial for enhancing audio quality.
Interactive Demo for quantized tflite converter
In our previous release, we have added a quantized tflite converter that exports nnp
to int8 tflite
. We have also made an interactive demo for quantized tflite converter to help you understand the concept and illustrate how to actually use it, so please give it a try!
Name | Notebook | Task |
---|---|---|
Post-training Quantization | Post-training Quanitzation |
Deep Exemplar-Based video Colorization
Input | Reference | Output |
---|---|---|
We have implemented inference code for Deep Exemplar-based Video Colorization (Zhang et al, CVPR 2019), an end-to-end network for exemplar-based video colorization. This model introduces a recurrent framework that unifies the semantic correspondence and color propagation steps to achieve temporal consistency while remaining faithful to reference image.
We have also implemented a colab demo for this, so please give it a try!
Name | Notebook | Task |
---|---|---|
Deep-Exemplar-based-Video-Colorization) | Video Colorization |
Fix sample missing problem for slice iterator
We have fixed a problem where some of the samples are missing while using slice iterator.
Optimize normalization functions (CPU / GPU)
We have optimized CUDA implementation of layer_normalization
, instance_normalization
, and group_normalization
for higher-speed processing. For example, with group normalization
, forward computation is up to 73 times faster than previous implementation, and backward computation is up to 40 times faster!
Fast reduction
We have also optimized the CUDA kernel for tensor reduction used for nnabla.functions.max
, nnabla.funtions.sum
, etc. For example, sum
is up to 5 times faster than previous implementation in certain cases.
Fix global seed setter for multi-device settings
We have modified global seed setters such as nnabla.functions.randn
so that the initial seed is randomly determined per process and for every execution of the script. By this change, with default setting, reproducibility is not guaranteed when using layers with random number generator. If you need to compensate for reproducibility, you can directly set initial seed using nnabla.random.set_function_seed(seed)
. For example, if you need a function to generate exactly the same random number, you can run nnabla.random.set_function_seed(313)
at the beginning of your script.
Build
- replace centos with ubuntu and use PEP600 tag (CPU / GPU)
- fix problem by ignore invalid text file
- Ease protobuf versions
Utilities
- Fix pillow save image with wrong array shape in grayscale mode.
- add extension and type_config arguments to create_communicator.
- prompt user to install flatbuffers package
- Disable TF32 by default
- Improve ioctl time on A100 by cudaDeviceGetAttribute
Examples
- [XAI] Representer Point
- [XAI] Kernel SHAP (tabular data)
- Add URL for pre-trained weight of deep exemplar-based video colorization
- [Fairness] Adversarial Debiasing (part1)
Bugfix
- Correct the number of elements and byte size display in debug message
- Bugfix the higher-order gradient of gather_nd (CPU / GPU)
- Use generic transpose kernel if the grid y-dimension exceeds the CUDA limit.
- fix errors in nnabla examples