1. 08 Sep, 2021 1 commit
    • Masaki Kozuki's avatar
      enable ninja (#1164) · 9ce0a10f
      Masaki Kozuki authored
      - passing include directories to `CUDAExtension`'s `include_dirs` argument
      - removing `-I/path/to/dir` arguments from `extra_compile_args`
      9ce0a10f
  2. 04 Sep, 2021 1 commit
    • Burc Eryilmaz's avatar
      fix CUBLAS guards (#1162) · 54b93919
      Burc Eryilmaz authored
      
      
      * support for fused dense layer with cublasLt, fusion in both fprop and bprop
      
      * fix typo causing syntax error
      
      * add fused GEMM+gelu+GEMM modue
      
      * fix typo for workspace size
      
      * update cublas check for 11600
      
      * add tests for fused dense layer
      
      * fix CUDA 10.x path
      
      * safer guard around CUBLAS constants, remove unreferenced variable
      
      * more guard changes
      
      * guard against cublas version instead of cuda
      
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      54b93919
  3. 03 Sep, 2021 2 commits
  4. 02 Sep, 2021 13 commits
  5. 01 Sep, 2021 3 commits
  6. 31 Aug, 2021 3 commits
  7. 30 Aug, 2021 1 commit
  8. 21 Aug, 2021 1 commit
  9. 17 Jul, 2021 3 commits
    • Nan Zheng's avatar
      Added more fusion and vectorized kernel for transducer (#1125) · 0c2c6eea
      Nan Zheng authored
      * Added support for fused ReLU and dropout into transducer joint
      
      * Reorganized code selection path in transducer joint fwd
      * Added support for fused ReLU+dropout into transducer joint
      
      * Vectorize transducer loss backward with fused softmax (#3)
      
      * Nanz/transducer loss (#4)
      
      * Vectorize transducer loss backward with fused softmax
      
      * Added a predicate to avoid potential IMA
      
      * Nanz/transducer loss (#5)
      
      * Vectorize transducer loss backward with fused softmax
      
      * Added a predicate to avoid potentional IMA
      
      * Added more predicates to avoid IMAs
      
      * Updated documentations for newly added features.
      
      * Fixed a error in transducer.py
      0c2c6eea
    • yjk21's avatar
      Adds small-batch kernels (#1126) · ed719967
      yjk21 authored
      ed719967
    • X Wang's avatar
      local_rank fix (#1129) · c1378e6f
      X Wang authored
      * local_rank and install cuda version fix
      c1378e6f
  10. 15 Jun, 2021 2 commits
  11. 26 May, 2021 1 commit
  12. 17 May, 2021 1 commit
  13. 20 Apr, 2021 1 commit
  14. 17 Apr, 2021 3 commits
  15. 16 Apr, 2021 1 commit
  16. 15 Apr, 2021 3 commits