Skip to content

Change name of missing value attribute from missing_value to _FillValue#616

Merged
mark-petersen merged 1 commit intoMPAS-Dev:developfrom
mark-petersen:framework/add_FillValue
Jul 13, 2020
Merged

Change name of missing value attribute from missing_value to _FillValue#616
mark-petersen merged 1 commit intoMPAS-Dev:developfrom
mark-petersen:framework/add_FillValue

Conversation

@mark-petersen
Copy link
Contributor

@mark-petersen mark-petersen commented Jun 30, 2020

This PR changes the name of the missing value attribute in netCDF output files
from missing_value to _FillValue. The _FillValue attribute can be set by adding
a missing_value to individual variables and var_arrays in the Registry file
(See #571 for an example).

This change is needed because some analysis software, notably NCO, does not
support the missing_value attribute by default and expects _FillValue instead.

@mark-petersen
Copy link
Contributor Author

mark-petersen commented Jun 30, 2020

I conducted some performance tests to see if missing_value versus _FillValue had any impact on write times. I used the MPAS-Ocean core with each, and the changes in #571. Writing 1GB on 4 nodes, I get identical times between missing_value versus _FillValue. Look across yellow cells, which is PIO2. This shows the mean and std dev for 10 tests.
Numbers are write time in seconds:
image
By both (right columns) I mean I added both lines:

src/tools/registry/gen_inc.c
1407 fortprintf(fd, "      call mpas_add_att(%s %% attLists(const_index) %% attList, 'missing_value', %s)\n", pointer_name_arr, missing_value);
1408 fortprintf(fd, "      call mpas_add_att(%s %% attLists(const_index) %% attList, '_FillValue', %s)\n", pointer_name_arr, missing_value);
1635 fortprintf(fd, "      call mpas_add_att(%s %% attLists(1) %% attList, 'missing_value', %s)\n", pointer_name_arr, missing_value);
1636 fortprintf(fd, "      call mpas_add_att(%s %% attLists(1) %% attList, '_FillValue', %s)\n", pointer_name_arr, missing_value);

The main message is that PIO2 is much faster than PIO1! But for this PR, the attribute name makes no difference with PIO2, but does make a slight difference with PIO1.

@xylar
Copy link
Collaborator

xylar commented Jul 1, 2020

@mark-petersen, thank you for doing this timing. Based on the results, I would advocate for just using _FillValue and not missing_value, since both seems like a bad option for PIO1.

I will do some testing of my own shortly and hopefully approve the PR.

@mgduda and @MiCurry, are you and your collaborators using both PIO1 and PIO2 on a regular basis?

@mgduda
Copy link
Contributor

mgduda commented Jul 7, 2020

@xylar I'm using PIO2 on almost every machine where I regularly build MPAS. But, I suspect there are a number of our collaborators who use PIO1 (if only because it can be easier to install). I don't think we should break compatibility with PIO1, though I don't see any issues in that regard with this PR. I agree that we should use just the _FillValue attribute, though, and not both _FillValue and missing_value.

@mgduda
Copy link
Contributor

mgduda commented Jul 7, 2020

@mark-petersen In the description of this PR, would it be more accurate to say that this merge "changes the name of the missing value attribute from missing_value to _FillValue" rather saying that this merge "sets missing value attribute to _FillValue"? I think the former would make it clearer that there already was a missing value attribute, and that we're changing its name.

It might also be helpful to summarize why we're making this change. Given the amount of time we all have spent in arriving at to this PR, I think a few sentences are in order. I can even add these if you're busy this week.

@MiCurry
Copy link
Contributor

MiCurry commented Jul 8, 2020

While these changes do add a _FIllValue attribute to a field, it, unfortunately, undoes the changes from PR #562. i.e. it doesn't fill the field with missing_value. Perhaps I am wrong, but wasn't the intent of these changes to have 'missing_value' in the registry add a _FillValue attribute to a field and fill the field with missing_value?

Checking out 6525f33 (the commit before), and running the same test the field is filled with 'missing_value'.

@xylar
Copy link
Collaborator

xylar commented Jul 8, 2020

@MiCurry, could you explain a bit more how you tested? Looking at the code, I don't understand why #562 would be affected. We are only changing the name of the attribute in the Fortran data structure, not the name in the registry. My reading of #562 is that it gets missing value from parsing the XML from the registry so it should still have missing_value as before. But I have not tested this yet myself.

@xylar
Copy link
Collaborator

xylar commented Jul 8, 2020

Following up, I reran my test from before in which I created a variable that is all zeros about the MPAS-ocean bathymetry an fill values (filled using @MiCurry's changes in #562) below it. I created a variable array that is all ones above the bathymetry and all fill values below. That worked out just fine for me. So I don't think this PR is having any negative effect on the changes in #562. @MiCurry, can you please provide steps to reproduce the problem you're seeing?

Copy link
Collaborator

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked great in my testing (see above)! I'm seeing _FillValue attributes on my test variables, as expected:

$ ncdump -h output.nc
...
	double testMissingVar(Time, nCells, nVertLevels) ;
		testMissingVar:units = "unitless" ;
		testMissingVar:long_name = "A variable that is zero where valid and missing_value where not" ;
		testMissingVar:_FillValue = 9.96920996838687e+36 ;
	double testMissingArrayVar(Time, nCells, nVertLevels) ;
		testMissingArrayVar:long_name = "A variable that is one where valid and missing_value where not" ;
		testMissingArrayVar:units = "unitless" ;
		testMissingArrayVar:_FillValue = 9.96920996838687e+36 ;
...

and corresponding dashes where the variable is invalid:

....
 testMissingArrayVar =
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _, _,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _, _, _, _,
...

@MiCurry
Copy link
Contributor

MiCurry commented Jul 8, 2020

Sorry about the delay. @xylar, I was trying to see if I could get a little more information on what I was experiencing. I'm quite confused myself as to what is happening. I created a nCells dimensioned integer field with missing_value="-1":

<var name="int_field" type="integer" dimensions="nCells" missing_value="-1"/>

And ran the init_atmosphere_model to interpolate real-data conditions and did not alter the code, so int_field is not being accessed or modified. After running the field correctly has the _FillValue attribute:

int int_field(nCells) ;
     int_field:_FillValue = -1 ;

But the field was not set to -1 and had '-'.

Switching the field to type=real though did set the array to -1.

<var name="real_field" type="real" dimensions="nCells" missing_value="-1"/>

But then it doesn't set the _FillValue attribute for real_field.

Perhaps I am setting things up incorrectly. I'll try another test on a different machine and see how that goes.

@xylar
Copy link
Collaborator

xylar commented Jul 8, 2020

@MiCurry, the first example with int_field sounds like the expected behavior to me. ncdump will display - instead of the numerical value for values equal to the _FillValue attribute. So your array is presumably full of -1, but you won't see it.

@xylar
Copy link
Collaborator

xylar commented Jul 8, 2020

The second case sounds odd. The _FillValue should have propagated through fine for a real just like for an int.

@MiCurry
Copy link
Contributor

MiCurry commented Jul 8, 2020

@MiCurry, the first example with int_field sounds like the expected behavior to me. ncdump will display - instead of the numerical value for values equal to the _FillValue attribute. So your array is presumably full of -1, but you won't see it.

Ah that makes sense! My mistake!

The second case sounds odd. The _FillValue should have propagated through fine for a real just like for an int.

I think I've discovered it. Looks like missing_value="-1" does not work with floats/reals, instead it needs to be: missing_value="-1.0". Switching it to missing_value="-1.0" propagated _FillValue down to the field.

@xylar
Copy link
Collaborator

xylar commented Jul 8, 2020

Hmm, maybe we should fix that. Seems like it will happen to others, too.

@MiCurry
Copy link
Contributor

MiCurry commented Jul 8, 2020

@mark-petersen In the description of this PR, would it be more accurate to say that this merge "changes the name of the missing value attribute from missing_value to _FillValue" rather saying that this merge "sets missing value attribute to _FillValue"? I think the former would make it clearer that there already was a missing value attribute, and that we're changing its name.

It might also be helpful to summarize why we're making this change. Given the amount of time we all have spent in arriving at to this PR, I think a few sentences are in order. I can even add these if you're busy this week.

@mark-petersen everything looks good to me, but I agree with @mgduda on the above. I also think it would be nice to be more explicit in the PR message e.g. 'The _Fillvalue attribute can be set by using missing_value in the registry'. I think that would be a clearer example than PR #571.

@mark-petersen mark-petersen changed the title Set missing value attribute to _FillValue Change the name of the missing value attribute from missing_value to _FillValue Jul 9, 2020
@mark-petersen mark-petersen changed the title Change the name of the missing value attribute from missing_value to _FillValue Change name of missing value attribute from missing_value to _FillValue Jul 9, 2020
@xylar
Copy link
Collaborator

xylar commented Jul 9, 2020

@mgduda and @MiCurry, is the new description adequate for your needs?

Copy link
Contributor

@MiCurry MiCurry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@mgduda
Copy link
Contributor

mgduda commented Jul 10, 2020

I just added the words "in netCDF output files" to distinguish between XML attributes in the Registry and netCDF attributes. I also wrapped the description so that in plain text the lines won't be longer than 80 columns.

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed timing numbers, @mark-petersen . This looks good to me.

@mark-petersen mark-petersen merged commit 30e08d6 into MPAS-Dev:develop Jul 13, 2020
@mark-petersen mark-petersen deleted the framework/add_FillValue branch July 13, 2020 12:20
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 27, 2020
#3737)

Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 29, 2020
Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants