-
Yuxin Wu authored
Summary: Make training much more stable (no NaNs in 7 runs). Results slightly better (average is around 78.8). Reviewed By: alexander-kirillov Differential Revision: D22427114 fbshipit-source-id: 3a3ea9895e5ac955a62454b73cff71f496e0fbfb
81d5a877