Skip to content

Conversation

@divine7022
Copy link
Collaborator

@divine7022 divine7022 commented Dec 7, 2025

Description

fixes #3692

  • updated run.write.configs to generate a runs_manifest.csv file containing run metadata (run_id, site_id, pft_name, trait, quantile, type) instead of appending to samples.Rdata.
  • samples.Rdata is now treated as a read-only "Master parameter definition" file.
  • updated write.sa.configs and write.ensemble.configs to return manifest dataframes.

fixes #3693

  • refactored read.sa.output and read.ensemble.output to pass the full vector of requested variables to read.output in a single call.
  • eliminated the inefficient inner loop that opened/closed netcdf files for every individual variable.

Additionally this PR separates input design generation for SA and ensemble runs, fixing dimension mismatch errors in multisite workflows.
previously both SA and ensemble used a single input_design matrix, causing failures when:

  • Ensemble needed N runs (e.g. 50)
  • SA needed 1 (median) + (traits * quantiles) runs (e.g. 145)

causes in dimension mismatch errors in input design coordination.

runModule.run.write.configs generates separate ens_input_design and sa_input_design matrices.
run.write.configs accepts both input_design_ens and input_design_sa parameters.
Old input_design parameter still works with deprecation warning.

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Comment on lines +15 to +17
#' @param input_design DEPRECATED. Use input_design_ens and input_design_sa instead
#' @param input_design_ens Input design matrix for ensemble analysis
#' @param input_design_sa Input design matrix for sensitivity analysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of having this be three separate arguments. What if we kept input_design, expecting a list with entries ensemble and sensitivity and also accepting a single design object (interpreting it as being for the ensemble)?

(We discussed live that it'd be nice to encode SA and ensemble setup into the design such that run.write.configs doesn't even need to know which one it's working with, but that's still well in the future)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 100% agree with @infotroph that we shouldn't have multiple input_design arguments. Where I disagree is that I don't think it makes sense to have ensemble and sensitivity elements. I think we should move to a place where a single call to run.write.configs runs a single input_design. If you want to run both an ensemble projection and a SA then you should call run.write.configs twice with different input_designs (one for SA one for ensemble). In other words, I think the SA/ensemble distinction should be deprecated at this point w.r.t. run.write.configs. For example, a Sobol SA is already actually executed via the "ensemble" pathway.

invisible(settings)
return(settings)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please don't remove terminating newlines. These days the rule is mostly just to avoid needless extra changes, but there do exist some (mostly ancient 😅 ) tools that complain if it isn't there.


manifest.file <- file.path(settings$outdir, "runs_manifest.csv")

if (nrow(run_manifest_df) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK for the manifest to be empty? If so I'd favor writing a zero-row file so it's easy to tell that afterwards; if not OK then should we throw an error here instead of skipping?

Comment on lines +236 to +245
if (overwrite && file.exists(manifest.file)) {
unlink(manifest.file)
}

utils::write.table(run_manifest_df,
file = manifest.file,
sep = ",",
row.names = FALSE,
col.names = !file.exists(manifest.file),
append = file.exists(manifest.file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write.table removes existing files unless append=TRUE, so this unlink isn't needed

Suggested change
if (overwrite && file.exists(manifest.file)) {
unlink(manifest.file)
}
utils::write.table(run_manifest_df,
file = manifest.file,
sep = ",",
row.names = FALSE,
col.names = !file.exists(manifest.file),
append = file.exists(manifest.file))
utils::write.table(run_manifest_df,
file = manifest.file,
sep = ",",
row.names = FALSE,
col.names = !file.exists(manifest.file),
append = !overwrite)

Comment on lines 251 to 252
invisible(settings)
return(settings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault but fixing this while I see it -- invisible() was doing nothing here and usually only makes sense as a return value

Suggested change
invisible(settings)
return(settings)
return(invisible(settings))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should know this, but what's actually been changed in the returned settings object. I'm wondering whether the settings is still the right thing to return at this point (e.g., are we moving to a place where the manifest might be the thing a user actually wants back?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed: at least ensemble IDs and PFT outdirs (if they weren't there in the input settings).

Return value: My gut is settings is still the right thing to return. I'm saying this without checking, but I think at the moment every runModule.*() takes a MultiSettings and returns a possibly modified MultiSettings, so changing that would be an even bigger overhaul. But maybe this is an argument for putting (a copy of) the manifest into the returned settings?

Comment on lines +32 to +35
for (i in seq_len(nrow(ens.run.ids))) {

# Handle column name difference (manifest has 'run_id', legacy 'ens.run.ids' might have 'id')
run.id <- if ("run_id" %in% names(ens.run.ids)) ens.run.ids$run_id[i] else ens.run.ids$id[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to iterate over the IDs directly instead of by index?

Suggested change
for (i in seq_len(nrow(ens.run.ids))) {
# Handle column name difference (manifest has 'run_id', legacy 'ens.run.ids' might have 'id')
run.id <- if ("run_id" %in% names(ens.run.ids)) ens.run.ids$run_id[i] else ens.run.ids$id[i]
# Handle column name difference (manifest has 'run_id', legacy 'ens.run.ids' might have 'id')
run_ids <- ens.run.ids$run_id %||% ens.run.ids$id
for (run_id in run_ids) {

if (nrow(subset_df) == 1) {
run.id <- subset_df$run_id
} else if (nrow(subset_df) > 1) {
PEcAn.logger::logger.warn("Multiple runs found for", trait, quantile, "- using the last one.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this happen and should it be an error instead of a warning?

next
}

pass_pft <- switch(per.pft + 1, NULL, pft.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly phrased -- I see this was existing code but I wonder if I'm missing a reason it wasn't pass_pft <- if(isTRUE(per.pft)) pft.name else NULL, which I'd find easier to read.

pass_pft <- switch(per.pft + 1, NULL, pft.name)

# Pass ALL variables at once to avoid repeated file opening. And call read.output once
out.tmp <- PEcAn.utils::read.output(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be for a different PR, but since this is getting converted to data frame a few lines down it might be worth benchmarking the existing approach against passing dataframe = TRUE into read.output

ens_input_design <- design_result$X
}

# handle SA design - check if already provided before generating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now turning into a fair amount of code repeated in both legs of the settings/multisettings if/else. Could we do some of this unconditionally outside the if? Or define a helper function to encapsulate it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants