Released Neural Network Libraries v1.11.0!

Wednesday, September 23, 2020


Posted by Takuya Yashima

Released Neural Network Libraries v1.11.0!


CUDA extension: If allReduce(e.g. MPI_Allreduce()) has no response, system will raise an exception.

​In multi-node training, developers are often confused whether training is actually hang up, or simply some calculation takes too much time. We may identify them by checking whether AllReduce() is returned within a certain time. Normally, we may use barrier() to set a synchronization point. Before this synchronization point, we allow different nodes to perform different calculation tasks whose processing time may vary. But after this synchronization point, we assume similar calculations are performed in multiple nodes, hence, the allReduce() should be finished within a certain period of time. If not, it is suggested that some node might have no response. This feature can stop training once any node occurs exception.

C-runtime: Support fixed-point operators, avoid to frequently convert between float and fixed-point

​Add fixed-point function version for the following functions:
– DepthwiseConvolution
– AddScalar, MulScalar
– Reshape
– Transpose
– Affine
– ReLU
– Add2, Sub2
– Mul2, Div2
– SumPooling
– MaxPooling
– AveragePooling



Format Conversion