A high-throughput and memory-efficient inference and serving engine for LLMs
Mamba SSM architecture
The official repository of Quamba
Fast Hadamard transform in CUDA, with a PyTorch interface
CUDA Templates for Linear Algebra Subroutines