• Nan Zheng's avatar
    Added more fusion and vectorized kernel for transducer (#1125) · 0c2c6eea
    Nan Zheng 创作于
    * Added support for fused ReLU and dropout into transducer joint
    
    * Reorganized code selection path in transducer joint fwd
    * Added support for fused ReLU+dropout into transducer joint
    
    * Vectorize transducer loss backward with fused softmax (#3)
    
    * Nanz/transducer loss (#4)
    
    * Vectorize transducer loss backward with fused softmax
    
    * Added a predicate to avoid potential IMA
    
    * Nanz/transducer loss (#5)
    
    * Vectorize transducer loss backward with fused softmax
    
    * Added a predicate to avoid potentional IMA
    
    * Added more predicates to avoid IMAs
    
    * Updated documentations for newly added features.
    
    * Fixed a error in transducer.py
    0c2c6eea