Seryilmaz/more cublas lt (#1147)
* support for fused dense layer with cublasLt, fusion in both fprop and bprop
* fix typo causing syntax error
* add fused GEMM+gelu+GEMM modue
* fix typo for workspace size
* update cublas check for 11600
* add tests for fused dense layer
* fix CUDA 10.x path
Co-authored-by:
Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
apex/fused_dense/__init__.py
0 → 100644
apex/fused_dense/fused_dense.py
0 → 100644
csrc/fused_dense.cpp
0 → 100644
csrc/fused_dense_cuda.cu
0 → 100644
此差异已折叠。
想要评论请 注册 或 登录