Hi, I've observed that when running training multiple times with the same random seed, the results are not fully reproducible — there's noticeable variation in the final performance metrics.
Have you observed this issue in your experiments? Do you have recommendations for achieving better reproducibility?
Thanks in advance for any insights!