Use out-of-place to avoid D2D copy in tensor parallel cross entropy (#1198)
* switch from clone to out-of-place subtract
* Update apex/mpu/cross_entropy.py
* Apply 1 suggestion(s) to 1 file(s)
Co-authored-by:
Eddie Yan <eddiey@nvidia.com>
想要评论请 注册 或 登录