Inconsistent training results with the same random seed

Hi, I've observed that when running training multiple times with the **same random seed**, the results are not fully reproducible — there's noticeable variation in the final performance metrics.

Have you observed this issue in your experiments? Do you have recommendations for achieving better reproducibility?

Thanks in advance for any insights!