Skip to content

Conversation

@mtaillefumier
Copy link
Contributor

  • the coefficients are now read directly from global memory when lp_max > 4 instead of storing them in shared memory. This reduce shared memory usage for large l
  • grid_miniapp seems unhappy and trigger a race condition when hab coefficients are calculated. So add an atomicAdd.
  • minor changes in the cmake build system.

- the coefficients are now read directly from global memory when lp_max > 4
  instead of storing them in shared memory. This reduce shared memory usage for large l
- grid_miniapp seems unhappy and trigger a race condition when hab coefficients
  are calculated. So add an atomicAdd.
- minor changes in the cmake build system.
@oschuett
Copy link
Member

Oh this is nice!

I also contemplated storing the Cab matrix in global memory instead of doing multiple passes. However, it would have been a major rewrite and the performance gain was unclear because shared memory is much faster than global. Now we can actually benchmark and compare the two approaches :-)

Btw, this exception in the unittest should then no longer be needed.

Linking #1785 for posterity.

@mtaillefumier
Copy link
Contributor Author

I only supressed the cxyz coefficients from the shared memory which means that we still have to store cab and alpha. alpha ain't an issue the cab might. for instance ncoset(l = 10) x ncoset(l = 10) is 286^2 doubles which is higher than the shared memory available.

One solution might be some hybrid case where we sort task in to low l and high l and then use a different algorithm for low and high l.

I can certainly do this on my side (cab in global memory) as integrate/collocate are separated from the calculation of the coefficients (at the prize of more used global memory).

I will lift the exception asap (tomorrow).

From experience collocate/integrate will always dominate the timers so if we loose a little with the cab been in global memory then be it. The overall gain of treating large l on GPU instead of CPU is worth the effort.

it would be worth triggering the HIP pascal tests. the grid_miniapp passes but still

@oschuett
Copy link
Member

I only supressed the cxyz coefficients from the shared memory which means that we still have to store cab and alpha.

I actually did the opposite and partitioned Cab while keeping the entire Cxyz in memory because it's much smaller. Take e.g. the case la = lb = 5:

  • Cab is of size ncoset(la) * ncoset(lb) = 56**2 = 3136
  • Cxyz is of size ncoset(la + lb) = 286

@mtaillefumier
Copy link
Contributor Author

I only supressed the cxyz coefficients from the shared memory which means that we still have to store cab and alpha.

I actually did the opposite and partitioned Cab while keeping the entire Cxyz in memory because it's much smaller. Take e.g. the case la = lb = 5:

* `Cab` is of size `ncoset(la) * ncoset(lb) = 56**2 = 3136`

* `Cxyz` is of size `ncoset(la + lb) = 286`

indeed. I removed the cab from shared memory entirely. Something is still puzzling me. I get this when I run grid_unittest (same hardware both cases).

GPU backend


Task: ../src/grid/sample_tasks/ortho_density_l3300.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 8.606467e+00   Max rel diff: 2.389972e-15   Time: 3.558520e-04 sec
forces[0, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[0, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
forces[1, 0] ref: 1.136793e-09 test: 1.125706e-09 diff:1.108688e-11 rel_diff: 1.108688e-11
forces[1, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[1, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:2.299203e-09 rel_diff: 2.178532e-10
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:4.074536e-10 rel_diff: 2.947282e-11
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:1.164153e-10 rel_diff: 1.852484e-11
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:5.748007e-10 rel_diff: 2.670024e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:7.821654e-10 rel_diff: 2.336538e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:1.611625e-09 rel_diff: 2.211155e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:5.587935e-09 rel_diff: 9.291552e-16
Task: ../src/grid/sample_tasks/ortho_density_l3333.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.230013e+06   Max rel diff: 2.178532e-14   Time: 6.466130e-04 sec
forces[0, 0] ref: 1.136793e-09 test: 4.452016e-10 diff:6.915917e-10 rel_diff: 6.915917e-10
forces[0, 1] ref: 1.311625e-09 test: 6.708094e-10 diff:6.408152e-10 rel_diff: 6.408152e-10
forces[0, 2] ref: 1.128143e-09 test: 4.962475e-10 diff:6.318955e-10 rel_diff: 6.318955e-10
forces[1, 0] ref: 1.136793e-09 test: 4.452016e-10 diff:6.915917e-10 rel_diff: 6.915917e-10
forces[1, 1] ref: 1.311625e-09 test: 6.708094e-10 diff:6.408152e-10 rel_diff: 6.408152e-10
forces[1, 2] ref: 1.128143e-09 test: 4.962475e-10 diff:6.318955e-10 rel_diff: 6.318955e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:1.303852e-08 rel_diff: 2.168271e-15
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:2.910383e-10 rel_diff: 2.757635e-11
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:5.238689e-09 rel_diff: 3.789363e-10
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:2.328306e-10 rel_diff: 3.704968e-11
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:1.210719e-08 rel_diff: 2.013429e-15
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:1.535227e-09 rel_diff: 7.131329e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:9.968062e-10 rel_diff: 2.977728e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:8.440111e-10 rel_diff: 1.157986e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:3.259629e-08 rel_diff: 5.420072e-15

hip backend

Task: ./src/grid/sample_tasks/ortho_density_l3300.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 8.606467e+00   Max rel diff: 2.573816e-15   Time: 5.322370e-04 sec
forces[0, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[0, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
forces[1, 0] ref: 1.136793e-09 test: 1.125706e-09 diff:1.108688e-11 rel_diff: 1.108688e-11
forces[1, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[1, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:2.299203e-09 rel_diff: 2.178532e-10
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:4.074536e-10 rel_diff: 2.947282e-11
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:1.164153e-10 rel_diff: 1.852484e-11
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:5.748007e-10 rel_diff: 2.670024e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:7.821654e-10 rel_diff: 2.336538e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:1.611625e-09 rel_diff: 2.211155e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:5.587935e-09 rel_diff: 9.291552e-16
Task: ./src/grid/sample_tasks/ortho_density_l3333.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.230013e+06   Max rel diff: 2.178532e-14   Time: 5.944550e-04 sec
forces[0, 0] ref: 1.136793e-09 test: 6.434798e-10 diff:4.933134e-10 rel_diff: 4.933134e-10
forces[0, 1] ref: 1.311625e-09 test: 6.021308e-10 diff:7.094938e-10 rel_diff: 7.094938e-10
forces[0, 2] ref: 1.128143e-09 test: 8.971696e-10 diff:2.309734e-10 rel_diff: 2.309734e-10
forces[1, 0] ref: 1.136793e-09 test: 6.434798e-10 diff:4.933134e-10 rel_diff: 4.933134e-10
forces[1, 1] ref: 1.311625e-09 test: 6.021308e-10 diff:7.094938e-10 rel_diff: 7.094938e-10
forces[1, 2] ref: 1.128143e-09 test: 8.971696e-10 diff:2.309734e-10 rel_diff: 2.309734e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:1.210719e-08 rel_diff: 2.013395e-15
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:1.280569e-09 rel_diff: 1.213359e-10
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:3.550667e-09 rel_diff: 2.568346e-10
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:9.022187e-10 rel_diff: 1.435675e-10
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:1.583248e-08 rel_diff: 2.632946e-15
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:8.512870e-10 rel_diff: 3.954339e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:2.044544e-09 rel_diff: 6.107602e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:2.240995e-09 rel_diff: 3.074654e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:1.862645e-08 rel_diff: 3.097184e-15

@oschuett
Copy link
Member

Something is still puzzling me. I get this when I run grid_unittest (same hardware both cases).

That looks strangely identical. There should be at least numerical noise. What does the statistics at the end say?
Note that only the tests with Batched actually use the different backends.

@mtaillefumier
Copy link
Contributor Author

mtaillefumier commented May 25, 2023

GPU backend


Task: ../src/grid/sample_tasks/ortho_density_l0000.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.181734e+03   Max rel diff: 1.065814e-18   Time: 6.982400e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l0000.task         Integrate Batched   Cycles: 1.000000e+00   Max value: 1.181734e+03   Max rel diff: 3.848136e-16   Time: 6.041870e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l0000.task         Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 3.505772e+00   Max rel diff: 0.000000e+00   Time: 1.714400e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l0000.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 3.505772e+00   Max rel diff: 1.616032e-15   Time: 1.513650e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l0122.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.245375e-10   Max rel diff: 2.067952e-25   Time: 6.054900e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l0122.task         Integrate Batched   Cycles: 1.000000e+00   Max value: 8.245375e-10   Max rel diff: 4.135903e-25   Time: 1.874039e-03 sec
Task: ../src/grid/sample_tasks/ortho_density_l0122.task         Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 5.197319e-10   Max rel diff: 0.000000e+00   Time: 5.854000e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l0122.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 5.197319e-10   Max rel diff: 3.101927e-25   Time: 1.841950e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l2200.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.664842e-04   Max rel diff: 1.350309e-18   Time: 2.087600e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l2200.task         Integrate Batched   Cycles: 1.000000e+00   Max value: 8.664842e-04   Max rel diff: 6.479539e-18   Time: 6.638940e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l2200.task         Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.579830e+00   Max rel diff: 0.000000e+00   Time: 9.223000e-06 sec
Task: ../src/grid/sample_tasks/ortho_density_l2200.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 1.579830e+00   Max rel diff: 3.091403e-14   Time: 6.939300e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l3300.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 6.519054e+02   Max rel diff: 2.210284e-16   Time: 1.398360e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l3300.task         Integrate Batched   Cycles: 1.000000e+00   Max value: 6.519054e+02   Max rel diff: 6.630853e-16   Time: 5.333688e-03 sec
Task: ../src/grid/sample_tasks/ortho_density_l3300.task         Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.606467e+00   Max rel diff: 0.000000e+00   Time: 5.796200e-05 sec
Task: ../src/grid/sample_tasks/ortho_density_l3300.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 8.606467e+00   Max rel diff: 2.389972e-15   Time: 3.558520e-04 sec
forces[0, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[0, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
forces[1, 0] ref: 1.136793e-09 test: 1.125706e-09 diff:1.108688e-11 rel_diff: 1.108688e-11
forces[1, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[1, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:2.299203e-09 rel_diff: 2.178532e-10
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:4.074536e-10 rel_diff: 2.947282e-11
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:1.164153e-10 rel_diff: 1.852484e-11
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:5.748007e-10 rel_diff: 2.670024e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:7.821654e-10 rel_diff: 2.336538e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:1.611625e-09 rel_diff: 2.211155e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:5.587935e-09 rel_diff: 9.291552e-16
Task: ../src/grid/sample_tasks/ortho_density_l3333.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.230013e+06   Max rel diff: 2.178532e-14   Time: 6.466130e-04 sec
forces[0, 0] ref: 1.136793e-09 test: 4.452016e-10 diff:6.915917e-10 rel_diff: 6.915917e-10
forces[0, 1] ref: 1.311625e-09 test: 6.708094e-10 diff:6.408152e-10 rel_diff: 6.408152e-10
forces[0, 2] ref: 1.128143e-09 test: 4.962475e-10 diff:6.318955e-10 rel_diff: 6.318955e-10
forces[1, 0] ref: 1.136793e-09 test: 4.452016e-10 diff:6.915917e-10 rel_diff: 6.915917e-10
forces[1, 1] ref: 1.311625e-09 test: 6.708094e-10 diff:6.408152e-10 rel_diff: 6.408152e-10
forces[1, 2] ref: 1.128143e-09 test: 4.962475e-10 diff:6.318955e-10 rel_diff: 6.318955e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:1.303852e-08 rel_diff: 2.168271e-15
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:2.910383e-10 rel_diff: 2.757635e-11
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:5.238689e-09 rel_diff: 3.789363e-10
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:2.328306e-10 rel_diff: 3.704968e-11
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:1.210719e-08 rel_diff: 2.013429e-15
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:1.535227e-09 rel_diff: 7.131329e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:9.968062e-10 rel_diff: 2.977728e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:8.440111e-10 rel_diff: 1.157986e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:3.259629e-08 rel_diff: 5.420072e-15
Task: ../src/grid/sample_tasks/ortho_density_l3333.task         Integrate Batched   Cycles: 1.000000e+00   Max value: 1.230013e+06   Max rel diff: 6.915917e-14   Time: 8.169245e-02 sec
Task: ../src/grid/sample_tasks/ortho_density_l3333.task         Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.957149e+02   Max rel diff: 2.220419e-16   Time: 1.863800e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l3333.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 1.957149e+02   Max rel diff: 2.425837e-14   Time: 1.433062e-03 sec
Task: ../src/grid/sample_tasks/ortho_density_l0505.task         Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.337700e-05   Max rel diff: 1.380358e-22   Time: 5.200737e-03 sec
Task: ../src/grid/sample_tasks/ortho_density_l0505.task         Integrate Batched   Cycles: 1.000000e+00   Max value: 1.337700e-05   Max rel diff: 8.470329e-21   Time: 5.319913e+00 sec
Task: ../src/grid/sample_tasks/ortho_density_l0505.task         Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.947394e-07   Max rel diff: 5.293956e-23   Time: 6.752830e-04 sec
Task: ../src/grid/sample_tasks/ortho_density_l0505.task         Collocate Batched   Cycles: 1.000000e+00   Max value: 2.947394e-07   Max rel diff: 4.499863e-22   Time: 3.630482e-02 sec
Task: ../src/grid/sample_tasks/ortho_non_periodic.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.349539e+00   Max rel diff: 2.428613e-17   Time: 4.326000e-05 sec
Task: ../src/grid/sample_tasks/ortho_non_periodic.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 2.349539e+00   Max rel diff: 5.670336e-16   Time: 1.521622e-03 sec
Task: ../src/grid/sample_tasks/ortho_non_periodic.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 4.483815e-01   Max rel diff: 0.000000e+00   Time: 2.213880e-04 sec
Task: ../src/grid/sample_tasks/ortho_non_periodic.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 4.483815e-01   Max rel diff: 2.220446e-16   Time: 5.039970e-04 sec
Task: ../src/grid/sample_tasks/ortho_tau.task                   Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.770995e-02   Max rel diff: 3.330669e-20   Time: 8.330700e-05 sec
Task: ../src/grid/sample_tasks/ortho_tau.task                   Integrate Batched   Cycles: 1.000000e+00   Max value: 8.770995e-02   Max rel diff: 2.775558e-17   Time: 1.339101e-02 sec
Task: ../src/grid/sample_tasks/ortho_tau.task                   Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.921986e-01   Max rel diff: 0.000000e+00   Time: 9.353000e-06 sec
Task: ../src/grid/sample_tasks/ortho_tau.task                   Collocate Batched   Cycles: 1.000000e+00   Max value: 2.921986e-01   Max rel diff: 1.665335e-16   Time: 1.833660e-04 sec
Task: ../src/grid/sample_tasks/general_density.task             Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 2.542300e-05 sec
Task: ../src/grid/sample_tasks/general_density.task             Integrate Batched   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 3.781240e-04 sec
Task: ../src/grid/sample_tasks/general_density.task             Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 3.885781e-16   Time: 1.602500e-05 sec
Task: ../src/grid/sample_tasks/general_density.task             Collocate Batched   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 2.775558e-16   Time: 7.976100e-05 sec
Task: ../src/grid/sample_tasks/general_tau.task                 Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.549947e+00   Max rel diff: 1.432595e-15   Time: 1.466280e-04 sec
Task: ../src/grid/sample_tasks/general_tau.task                 Integrate Batched   Cycles: 1.000000e+00   Max value: 1.549947e+00   Max rel diff: 8.595572e-16   Time: 2.636295e-03 sec
Task: ../src/grid/sample_tasks/general_tau.task                 Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 3.574332e-01   Max rel diff: 7.216450e-16   Time: 8.227800e-05 sec
Task: ../src/grid/sample_tasks/general_tau.task                 Collocate Batched   Cycles: 1.000000e+00   Max value: 3.574332e-01   Max rel diff: 8.881784e-16   Time: 3.365780e-04 sec
Task: ../src/grid/sample_tasks/general_subpatch0.task           Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 2.240400e-05 sec
Task: ../src/grid/sample_tasks/general_subpatch0.task           Integrate Batched   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 3.777570e-04 sec
Task: ../src/grid/sample_tasks/general_subpatch0.task           Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 3.885781e-16   Time: 1.323300e-05 sec
Task: ../src/grid/sample_tasks/general_subpatch0.task           Collocate Batched   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 2.775558e-16   Time: 8.003300e-05 sec
Task: ../src/grid/sample_tasks/general_subpatch16.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 6.361029e-04   Max rel diff: 4.119968e-18   Time: 1.155540e-04 sec
Task: ../src/grid/sample_tasks/general_subpatch16.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 6.361029e-04   Max rel diff: 4.336809e-18   Time: 1.624923e-03 sec
Task: ../src/grid/sample_tasks/general_subpatch16.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 3.690679e-03   Max rel diff: 5.160802e-17   Time: 2.825520e-04 sec
Task: ../src/grid/sample_tasks/general_subpatch16.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 3.690679e-03   Max rel diff: 5.117434e-17   Time: 8.980870e-04 sec
Task: ../src/grid/sample_tasks/general_overflow.task            Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.597303e+01   Max rel diff: 3.336294e-15   Time: 2.512200e-05 sec
Task: ../src/grid/sample_tasks/general_overflow.task            Integrate Batched   Cycles: 1.000000e+00   Max value: 1.597303e+01   Max rel diff: 3.558713e-15   Time: 2.696330e-04 sec
Task: ../src/grid/sample_tasks/general_overflow.task            Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 7.584822e+00   Max rel diff: 5.513005e-14   Time: 3.956000e-05 sec
Task: ../src/grid/sample_tasks/general_overflow.task            Collocate Batched   Cycles: 1.000000e+00   Max value: 7.584822e+00   Max rel diff: 5.513005e-14   Time: 1.088830e-04 sec

 -------------------------------------------------------------------------------
 -                                                                             -
 -                                GRID STATISTICS                              -
 -                                                                             -
 -------------------------------------------------------------------------------
 LP    KERNEL             BACKEND                              COUNT     PERCENT
 0     collocate general  CPU                                      4       7.69%
 3     integrate general  CPU                                      4       7.69%
 0     collocate general  GPU                                      4       7.69%
 3     integrate general  GPU                                      4       7.69%
 3     collocate ortho    CPU                                      3       5.77%
 6     integrate ortho    CPU                                      3       5.77%
 3     collocate ortho    GPU                                      3       5.77%
 6     integrate ortho    GPU                                      3       5.77%
 2     collocate ortho    CPU                                      2       3.85%
 5     integrate ortho    CPU                                      2       3.85%
 2     collocate ortho    GPU                                      2       3.85%
 5     integrate ortho    GPU                                      2       3.85%
 0     collocate ortho    CPU                                      1       1.92%
 6     collocate ortho    CPU                                      1       1.92%
 10    collocate ortho    CPU                                      1       1.92%
 3     integrate ortho    CPU                                      1       1.92%
 9     integrate ortho    CPU                                      1       1.92%
 13    integrate ortho    CPU                                      1       1.92%
 2     collocate general  CPU                                      1       1.92%
 5     integrate general  CPU                                      1       1.92%
 0     collocate ortho    GPU                                      1       1.92%
 6     collocate ortho    GPU                                      1       1.92%
 10    collocate ortho    GPU                                      1       1.92%
 3     integrate ortho    GPU                                      1       1.92%
 9     integrate ortho    GPU                                      1       1.92%
 13    integrate ortho    GPU                                      1       1.92%
 2     collocate general  GPU                                      1       1.92%
 5     integrate general  GPU                                      1       1.92%
 -------------------------------------------------------------------------------

All tests have passed :-)

HIP backend

Task: ./src/grid/sample_tasks/ortho_density_l0000.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.181734e+03   Max rel diff: 1.065814e-18   Time: 6.884800e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l0000.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 1.181734e+03   Max rel diff: 5.772204e-16   Time: 3.952390e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l0000.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 3.505772e+00   Max rel diff: 0.000000e+00   Time: 2.553200e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l0000.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 3.505772e+00   Max rel diff: 1.797597e-15   Time: 2.672700e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l0122.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.245375e-10   Max rel diff: 2.067952e-25   Time: 7.017200e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l0122.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 8.245375e-10   Max rel diff: 3.101927e-25   Time: 8.682160e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l0122.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 5.197319e-10   Max rel diff: 0.000000e+00   Time: 5.977800e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l0122.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 5.197319e-10   Max rel diff: 3.101927e-25   Time: 2.369920e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l2200.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.664842e-04   Max rel diff: 1.350309e-18   Time: 2.036100e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l2200.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 8.664842e-04   Max rel diff: 4.458339e-18   Time: 2.234590e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l2200.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.579830e+00   Max rel diff: 0.000000e+00   Time: 1.036900e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l2200.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 1.579830e+00   Max rel diff: 7.835881e-14   Time: 7.682200e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l3300.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 6.519054e+02   Max rel diff: 2.210284e-16   Time: 1.333810e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l3300.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 6.519054e+02   Max rel diff: 7.705377e-16   Time: 3.705109e-03 sec
Task: ./src/grid/sample_tasks/ortho_density_l3300.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.606467e+00   Max rel diff: 0.000000e+00   Time: 6.417500e-05 sec
Task: ./src/grid/sample_tasks/ortho_density_l3300.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 8.606467e+00   Max rel diff: 2.573816e-15   Time: 5.325280e-04 sec
forces[0, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[0, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
forces[1, 0] ref: 1.136793e-09 test: 1.125706e-09 diff:1.108688e-11 rel_diff: 1.108688e-11
forces[1, 1] ref: 1.311625e-09 test: 1.205629e-09 diff:1.059952e-10 rel_diff: 1.059952e-10
forces[1, 2] ref: 1.128143e-09 test: 9.590056e-10 diff:1.691374e-10 rel_diff: 1.691374e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:2.299203e-09 rel_diff: 2.178532e-10
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:4.074536e-10 rel_diff: 2.947282e-11
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:1.164153e-10 rel_diff: 1.852484e-11
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:0.000000e+00 rel_diff: 0.000000e+00
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:5.748007e-10 rel_diff: 2.670024e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:7.821654e-10 rel_diff: 2.336538e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:1.611625e-09 rel_diff: 2.211155e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:5.587935e-09 rel_diff: 9.291552e-16
Task: ./src/grid/sample_tasks/ortho_density_l3333.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.230013e+06   Max rel diff: 2.178532e-14   Time: 6.034160e-04 sec
forces[0, 0] ref: 1.136793e-09 test: 6.434798e-10 diff:4.933134e-10 rel_diff: 4.933134e-10
forces[0, 1] ref: 1.311625e-09 test: 6.021308e-10 diff:7.094938e-10 rel_diff: 7.094938e-10
forces[0, 2] ref: 1.128143e-09 test: 8.971696e-10 diff:2.309734e-10 rel_diff: 2.309734e-10
forces[1, 0] ref: 1.136793e-09 test: 6.434798e-10 diff:4.933134e-10 rel_diff: 4.933134e-10
forces[1, 1] ref: 1.311625e-09 test: 6.021308e-10 diff:7.094938e-10 rel_diff: 7.094938e-10
forces[1, 2] ref: 1.128143e-09 test: 8.971696e-10 diff:2.309734e-10 rel_diff: 2.309734e-10
virial[ 0, 0] ref: 6.013323e+06 test: 6.013323e+06 diff:1.210719e-08 rel_diff: 2.013395e-15
virial[ 0, 1] ref: 1.055391e+01 test: 1.055391e+01 diff:1.280569e-09 rel_diff: 1.213359e-10
virial[ 0, 2] ref: 1.382472e+01 test: 1.382472e+01 diff:3.550667e-09 rel_diff: 2.568346e-10
virial[ 1, 0] ref: 6.284282e+00 test: 6.284282e+00 diff:9.022187e-10 rel_diff: 1.435675e-10
virial[ 1, 1] ref: 6.013220e+06 test: 6.013220e+06 diff:1.583248e-08 rel_diff: 2.632946e-15
virial[ 1, 2] ref: 2.152792e+01 test: 2.152792e+01 diff:8.512870e-10 rel_diff: 3.954339e-11
virial[ 2, 0] ref: -3.347540e+01 test: -3.347540e+01 diff:2.044544e-09 rel_diff: 6.107602e-11
virial[ 2, 1] ref: 7.288609e+01 test: 7.288609e+01 diff:2.240995e-09 rel_diff: 3.074654e-11
virial[ 2, 2] ref: 6.013995e+06 test: 6.013995e+06 diff:1.862645e-08 rel_diff: 3.097184e-15
Task: ./src/grid/sample_tasks/ortho_density_l3333.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 1.230013e+06   Max rel diff: 7.094938e-14   Time: 7.670461e-02 sec
Task: ./src/grid/sample_tasks/ortho_density_l3333.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.957149e+02   Max rel diff: 2.220419e-16   Time: 2.036780e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l3333.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 1.957149e+02   Max rel diff: 4.826695e-14   Time: 1.821013e-02 sec
Task: ./src/grid/sample_tasks/ortho_density_l0505.task          Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.337700e-05   Max rel diff: 1.380358e-22   Time: 5.211712e-03 sec
Task: ./src/grid/sample_tasks/ortho_density_l0505.task          Integrate Batched   Cycles: 1.000000e+00   Max value: 1.337700e-05   Max rel diff: 8.470329e-21   Time: 7.079433e-01 sec
Task: ./src/grid/sample_tasks/ortho_density_l0505.task          Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.947394e-07   Max rel diff: 5.293956e-23   Time: 6.467670e-04 sec
Task: ./src/grid/sample_tasks/ortho_density_l0505.task          Collocate Batched   Cycles: 1.000000e+00   Max value: 2.947394e-07   Max rel diff: 4.499863e-22   Time: 5.112278e-02 sec
Task: ./src/grid/sample_tasks/ortho_non_periodic.task           Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.349539e+00   Max rel diff: 2.428613e-17   Time: 4.203000e-05 sec
Task: ./src/grid/sample_tasks/ortho_non_periodic.task           Integrate Batched   Cycles: 1.000000e+00   Max value: 2.349539e+00   Max rel diff: 3.780224e-16   Time: 9.335640e-04 sec
Task: ./src/grid/sample_tasks/ortho_non_periodic.task           Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 4.483815e-01   Max rel diff: 0.000000e+00   Time: 2.225200e-04 sec
Task: ./src/grid/sample_tasks/ortho_non_periodic.task           Collocate Batched   Cycles: 1.000000e+00   Max value: 4.483815e-01   Max rel diff: 2.220446e-16   Time: 5.692620e-04 sec
Task: ./src/grid/sample_tasks/ortho_tau.task                    Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 8.770995e-02   Max rel diff: 3.330669e-20   Time: 6.622400e-05 sec
Task: ./src/grid/sample_tasks/ortho_tau.task                    Integrate Batched   Cycles: 1.000000e+00   Max value: 8.770995e-02   Max rel diff: 1.387779e-17   Time: 3.755328e-03 sec
Task: ./src/grid/sample_tasks/ortho_tau.task                    Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.921986e-01   Max rel diff: 0.000000e+00   Time: 9.259000e-06 sec
Task: ./src/grid/sample_tasks/ortho_tau.task                    Collocate Batched   Cycles: 1.000000e+00   Max value: 2.921986e-01   Max rel diff: 2.498002e-16   Time: 4.846450e-04 sec
Task: ./src/grid/sample_tasks/general_density.task              Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 2.532800e-05 sec
Task: ./src/grid/sample_tasks/general_density.task              Integrate Batched   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 1.745710e-04 sec
Task: ./src/grid/sample_tasks/general_density.task              Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 3.885781e-16   Time: 1.553000e-05 sec
Task: ./src/grid/sample_tasks/general_density.task              Collocate Batched   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 2.775558e-16   Time: 1.511700e-04 sec
Task: ./src/grid/sample_tasks/general_tau.task                  Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.549947e+00   Max rel diff: 1.432595e-15   Time: 1.413200e-04 sec
Task: ./src/grid/sample_tasks/general_tau.task                  Integrate Batched   Cycles: 1.000000e+00   Max value: 1.549947e+00   Max rel diff: 1.002817e-15   Time: 1.431923e-03 sec
Task: ./src/grid/sample_tasks/general_tau.task                  Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 3.574332e-01   Max rel diff: 7.216450e-16   Time: 8.671700e-05 sec
Task: ./src/grid/sample_tasks/general_tau.task                  Collocate Batched   Cycles: 1.000000e+00   Max value: 3.574332e-01   Max rel diff: 8.881784e-16   Time: 2.145640e-04 sec
Task: ./src/grid/sample_tasks/general_subpatch0.task            Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 2.291600e-05 sec
Task: ./src/grid/sample_tasks/general_subpatch0.task            Integrate Batched   Cycles: 1.000000e+00   Max value: 2.642039e+01   Max rel diff: 1.344686e-16   Time: 1.483590e-04 sec
Task: ./src/grid/sample_tasks/general_subpatch0.task            Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 3.885781e-16   Time: 1.307200e-05 sec
Task: ./src/grid/sample_tasks/general_subpatch0.task            Collocate Batched   Cycles: 1.000000e+00   Max value: 5.560563e-01   Max rel diff: 2.775558e-16   Time: 1.487060e-04 sec
Task: ./src/grid/sample_tasks/general_subpatch16.task           Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 6.361029e-04   Max rel diff: 4.119968e-18   Time: 1.124520e-04 sec
Task: ./src/grid/sample_tasks/general_subpatch16.task           Integrate Batched   Cycles: 1.000000e+00   Max value: 6.361029e-04   Max rel diff: 4.011548e-18   Time: 1.540413e-03 sec
Task: ./src/grid/sample_tasks/general_subpatch16.task           Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 3.690679e-03   Max rel diff: 5.160802e-17   Time: 2.851540e-04 sec
Task: ./src/grid/sample_tasks/general_subpatch16.task           Collocate Batched   Cycles: 1.000000e+00   Max value: 3.690679e-03   Max rel diff: 5.117434e-17   Time: 8.207350e-04 sec
Task: ./src/grid/sample_tasks/general_overflow.task             Integrate PGF-Ref   Cycles: 1.000000e+00   Max value: 1.597303e+01   Max rel diff: 3.336294e-15   Time: 2.672600e-05 sec
Task: ./src/grid/sample_tasks/general_overflow.task             Integrate Batched   Cycles: 1.000000e+00   Max value: 1.597303e+01   Max rel diff: 3.336294e-15   Time: 1.615260e-04 sec
Task: ./src/grid/sample_tasks/general_overflow.task             Collocate PGF-Ref   Cycles: 1.000000e+00   Max value: 7.584822e+00   Max rel diff: 5.513005e-14   Time: 3.464700e-05 sec
Task: ./src/grid/sample_tasks/general_overflow.task             Collocate Batched   Cycles: 1.000000e+00   Max value: 7.584822e+00   Max rel diff: 5.513005e-14   Time: 1.087460e-04 sec

 -------------------------------------------------------------------------------
 -                                                                             -
 -                                GRID STATISTICS                              -
 -                                                                             -
 -------------------------------------------------------------------------------
 LP    KERNEL             BACKEND                              COUNT     PERCENT
 0     collocate general  CPU                                      4       7.69%
 3     integrate general  CPU                                      4       7.69%
 0     collocate general  HIP                                      4       7.69%
 3     integrate general  HIP                                      4       7.69%
 3     collocate ortho    CPU                                      3       5.77%
 6     integrate ortho    CPU                                      3       5.77%
 3     collocate ortho    HIP                                      3       5.77%
 6     integrate ortho    HIP                                      3       5.77%
 2     collocate ortho    CPU                                      2       3.85%
 5     integrate ortho    CPU                                      2       3.85%
 2     collocate ortho    HIP                                      2       3.85%
 5     integrate ortho    HIP                                      2       3.85%
 0     collocate ortho    CPU                                      1       1.92%
 6     collocate ortho    CPU                                      1       1.92%
 10    collocate ortho    CPU                                      1       1.92%
 3     integrate ortho    CPU                                      1       1.92%
 9     integrate ortho    CPU                                      1       1.92%
 13    integrate ortho    CPU                                      1       1.92%
 2     collocate general  CPU                                      1       1.92%
 5     integrate general  CPU                                      1       1.92%
 0     collocate ortho    HIP                                      1       1.92%
 6     collocate ortho    HIP                                      1       1.92%
 10    collocate ortho    HIP                                      1       1.92%
 3     integrate ortho    HIP                                      1       1.92%
 9     integrate ortho    HIP                                      1       1.92%
 13    integrate ortho    HIP                                      1       1.92%
 2     collocate general  HIP                                      1       1.92%
 5     integrate general  HIP                                      1       1.92%
 -------------------------------------------------------------------------------

All tests have passed :-)

@mtaillefumier
Copy link
Contributor Author

they are different and it is not where my trouble comes from. It is more about the difference ref/GPU and ref/hip.

@oschuett
Copy link
Member

they are different and it is not where my trouble comes from. It is more about the difference ref/GPU and ref/hip.

Those difference should be ok because they are below our tolerances of 1e-12 for matrix elements and 1e-8 for forces.

These "warning lines" are already printed when the diffs are surpassing 0.01 of our thresholds. While this was useful during development, now it's confusing. Hence, I've opened #2797 to fix this inconsistency.

@mtaillefumier
Copy link
Contributor Author

perfect. thanks for the clarification because I was searching for something wrong in the code. #2797 can be merged first then I will update this PR unless there is no conflict you can merge both. You can squash all commits in to one if you wish

@oschuett oschuett merged commit b314dfa into cp2k:master May 25, 2023
@mtaillefumier mtaillefumier deleted the shared_mem_fix branch January 30, 2025 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants