Skip to content

Update mpas_kd_tree.F to break ties on equidistant queries#630

Merged
mgduda merged 1 commit intoMPAS-Dev:developfrom
MiCurry:init_atmosphere/kd_tree_ties
Jul 21, 2020
Merged

Update mpas_kd_tree.F to break ties on equidistant queries#630
mgduda merged 1 commit intoMPAS-Dev:developfrom
MiCurry:init_atmosphere/kd_tree_ties

Conversation

@MiCurry
Copy link
Contributor

@MiCurry MiCurry commented Jul 13, 2020

Previous to this commit if a query in is equidistant from two points of the KD-Tree, the resulting 'closet' point (from mpas_kd_search) would be the last point that the KD-Tree visited in its search.

This is fine for single processors, but if two tasks share the same point (along the boundary of two decompositions) they each can decide that a single query lies closer to two different points. For the static field interpolation this would mean that a single data value can be counted twice, which produces differences static-fields across task counts.

This commit breaks these ties so that each task will choose the same node that a query belongs too.

@MiCurry
Copy link
Contributor Author

MiCurry commented Jul 13, 2020

I have added an additional test to the mpas_init_atm_core.F Github gist that I used to test the original KD-Tree, which can be found here: https://gist.github.com/MiCurry/b5744ef36eec3b3ba40a8de806201ed1

@mgduda
Copy link
Contributor

mgduda commented Jul 13, 2020

It seems like having a module variable, kd_tree_ndims, to store the number of dimensions in a kd-tree would prevent us from creating multiple trees of differing dimensionality. Could the dimensionality of a tree be moved to a member of the mpas_kd_type? Alternatively, where kd_tree_ndims is used, would it not be possible to instead use, e.g., size(p1 % point(:))?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assignment necessary? L.240 should guarantee that the l.h.s here is the same as the r.h.s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct! Good point!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this assignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like I missed it in the last update.

@MiCurry
Copy link
Contributor Author

MiCurry commented Jul 13, 2020

It seems like having a module variable, kd_tree_ndims, to store the number of dimensions in a kd-tree would prevent us from creating multiple trees of differing dimensionality. Could the dimensionality of a tree be moved to a member of the mpas_kd_type? Alternatively, where kd_tree_ndims is used, would it not be possible to instead use, e.g., size(p1 % point(:))?

Ah that is right. I forgot that mpas_kd_tree is a module and not a class. My thought, which was probably incorrect, was that using a module variable would be quicker to access than using size, but with more thought, I'm guessing that the size intrinsic is pretty efficient, since Fortran size and shape information is stored with the variable.

I'll update it to use size(p1 % point(:)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it's a little awkward to have a cycle statement here. Wouldn't the following loop without a cycle statement work?

      do i = 1, size(p1 % point(:))                                                                                                  
          if (p1 % point(i) < p2 % point(i)) then                                                                                    
              tie = -1                                                                                                               
              return                                                                                                                 
          else if (p1 % point(i) > p2 % point(i)) then                                                                               
              tie = 1                                                                                                                
              return                                                                                                                 
          end if                                                                                                                     
      end do 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure whether the loop as written (with the cycle statement) works as expected. For example, if I define two points:

      p1 % point(:) = [1.0, 1.0, 0.0]                                            
      p2 % point(:) = [0.0, 0.0, 1.0]    

the break_tie function returns -1 (i.e., p1 < p2), which isn't what I was expecting. However, it could be that I've misunderstood the expected behavior. Ultimately, as long as the function works deterministically, we should be fine; but, it's always nice when the function behavior matches what most people would intuitively expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgduda I think the portion of code is also a bit awkward, particularly I don't like the double if statements. I like your implementation better and I'll switch it over to that.

Too your second comment, I think -1 was being returned as there were a few return statements missing. When I wrote my unit test, I wrote a test and figured it was enough, but in the back of my mind I knew I needed a few more, and it appears that thought was right, so I'll add a few more tests to make sure we've got good coverage.

@mgduda
Copy link
Contributor

mgduda commented Jul 14, 2020

There are a couple of places where the commit messages or code comments make references to interior/exterior cells, or to different MPI tasks, and I think the problem that this PR is addressing is not inherently tied to either of these. For example, a serial program could construct two different kd-trees that share points and be affected by the issue of searches in one tree or the other returning different "nearest" points. As such, I'd suggest we keep the descriptions general.

@MiCurry
Copy link
Contributor Author

MiCurry commented Jul 15, 2020

There are a couple of places where the commit messages or code comments make references to interior/exterior cells, or to different MPI tasks, and I think the problem that this PR is addressing is not inherently tied to either of these. For example, a serial program could construct two different kd-trees that share points and be affected by the issue of searches in one tree or the other returning different "nearest" points. As such, I'd suggest we keep the descriptions general.

That sounds great. Thanks for the note on this. I was actually wondering how exactly the commit message should be worded. I agree, I'll update it to be more generalized.

@MiCurry MiCurry force-pushed the init_atmosphere/kd_tree_ties branch from baa0e30 to 26b4a27 Compare July 16, 2020 18:48
@MiCurry
Copy link
Contributor Author

MiCurry commented Jul 16, 2020

Okay, @mgduda. I have updated the comparison loop (so that it no longer contains cycle), removed the distance = current_distance assignment and generalized the commit message, and comments. Let me know if you think they could still be improved at all.

@mgduda
Copy link
Contributor

mgduda commented Jul 16, 2020

The commit message mentions 'closet' in a couple of places; I think this should be 'closest'.

Also, in the commit message, it's stated that "the resulting 'closet' point (from mpas_kd_search) would be the last
point that the KD-Tree visited in its search." Is this correct? I admit that I haven't reviewed the full logic of the search function, but from this block it looks like the point that is returned would be the first one that was encountered:

      if (current_distance < distance) then
         distance = current_distance
         res => kdtree

@MiCurry MiCurry force-pushed the init_atmosphere/kd_tree_ties branch from 26b4a27 to c652356 Compare July 20, 2020 17:44
@MiCurry
Copy link
Contributor Author

MiCurry commented Jul 20, 2020

The commit message mentions 'closet' in a couple of places; I think this should be 'closest'.

That's embarrassing! Thanks for the catch.

Also, in the commit message, it's stated that "the resulting 'closet' point (from mpas_kd_search) would be the last
point that the KD-Tree visited in its search." Is this correct? I admit that I haven't reviewed the full logic of the search function, but from this block it looks like the point that is returned would be the first one that was encountered:

You are correct! I was wrong. I just checked with my small toy program and it point chosen (originally) on equidistant queries is the first point encountered.

I have just pushed an update that fixes the spelling mistakes and the above error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"than" -> "then"

@mgduda
Copy link
Contributor

mgduda commented Jul 20, 2020

@MiCurry Other than one other minor typo, I think this PR is ready to be merged. Since we have the opportunity to easily do so, though, it seems worth changing "than" to "then" as noted in the preceding comment.

Previous to this commit if a query is equidistant from two points of the
KD-Tree, the resulting 'closest' point (from mpas_kd_search) would be the first
point that the KD-Tree visited in its search. This produces different results
across different trees that contain the same ties, but have a
different tree structure.

This commit adds a function to break these ties, so that different trees will
predictively select the same point on equidistant queries.
@MiCurry MiCurry force-pushed the init_atmosphere/kd_tree_ties branch from c652356 to ddd137d Compare July 21, 2020 16:10
@MiCurry
Copy link
Contributor Author

MiCurry commented Jul 21, 2020

Okay, @mgduda I have just pushed a fix for that typo, so we should be good to merge!

@mgduda mgduda self-requested a review July 21, 2020 18:22
@mgduda mgduda merged commit 9cb179c into MPAS-Dev:develop Jul 21, 2020
@mgduda
Copy link
Contributor

mgduda commented Jul 21, 2020

@MiCurry It seems GitHub doesn't allow me to add a review if I've already pushed the merge commit. Needless to say, I've approved this PR.

@MiCurry MiCurry deleted the init_atmosphere/kd_tree_ties branch July 27, 2020 16:57
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 27, 2020
#3737)

Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 29, 2020
Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants