Inference time without output multiplier?

In the forward pass at the inference time, why don't we add  output multiplier (1/N) like the training time?

Inference time:
https://github.com/EleutherAI/nanoGPT-mup/blob/b2a5e60948d2e11885dc76c1355af98705f54a70/model.py#L225

Training time:
https://github.com/EleutherAI/nanoGPT-mup/blob/b2a5e60948d2e11885dc76c1355af98705f54a70/model.py#L219