Skip to content

Conversation

@AritraDey-Dev
Copy link
Member

@AritraDey-Dev AritraDey-Dev commented Dec 7, 2025

Description

This PR implements a Quarto notebook for the meta-analysis. It uses the posterior files and trait.data.Rdata to run the Meta Analysis demo. A pecan.xml file is also included to run the workflow.

Motivation and Context

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Add Demo 03 notebook to run meta-analysis with pre-generated data.

Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>

Meta-analysis in PEcAn is a hierarchical Bayesian statistical approach that synthesizes plant trait data from literature to constrain ecosystem model parameters. The **PEcAn.MA** module implements this functionality to combine prior information with observational data, generating posterior distributions for model parameters.

In a standard PEcAn workflow, this step queries the BETYdb database for trait data and priors. For this demonstration, we will use pre-generated data files to simulate the workflow without requiring an active database connection during the notebook execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephase this. The MA runs on tabular data in a specific format. One way to easily get data in that format is to query BETYdb, but you can also generate that format manually if you have other trait data. Indeed, a great new Issue for first time PEcAn developers would be create a helper function(s) that reformats trait data from common trait databases (e.g. TRY) into the tabular format this module is expecting. Given that no one is actively updating BETY, this approach is probably going to be the de facto norm for most users, and in the future there should be an update to this demo once we have functions that enable this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created here #3717


**Context & modeling scenario:**

We simulate plant and ecosystem carbon balance (Net Primary Productivity and Net Ecosystem Exchange) at the AmeriFlux Niwot Ridge Forest site ([US‑NR1](https://ameriflux.lbl.gov/sites/siteinfo/US-NR1)) during the year 2004. We use SIPNET parameterized as a temperate conifer PFT and driven by AmeriFlux meteorology following the analysis in [Moore et al. (2007)](https://doi.org/10.1016/j.agrformet.2008.04.013). This notebook also provides a compact template that can be extended to more years, locations, and PFTs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

language here is a bit rough. Is this something you're telling the user they want to do? An example? A reference back to Demo 1 and Demo 2? Also note that one can use the MA without then feeding the posteriors into a model (though we definitely want to highlight the latter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually,This was taken from Demo 1 and Demo 2 to show the scenario we were considering for this notebook. However, in our case this scenario is incorrect because we are not using site information or met data in the settings file.need to update the scenario to match the minimal settings file used in this demo i.e. #3707 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted this in ac4ece1..


In this specific demo, we are performing a meta-analysis for the **temperate coniferous** Plant Functional Type (PFT). The goal is to estimate the probability distributions for key model parameters (e.g., SLA, leaf turnover rate) by combining:
* **Priors**: Existing knowledge about the parameters.
* **Data**: Observed trait data (simulated for this demo).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the demo really using simulated data? Why? We've got plenty of real data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not simulated,it should be pre-generated.I just query trait data with the settings and save it as from the db(local postgres setup) trait.data.Rdata particularly for this demo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hoped that was the case. Just make sure the text matches what you actually did.

pecanproject = 'https://pecanproject.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
# Download and install PEcAn.all in R
install.packages('PEcAn.all')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend a bit more nuance here. If you just want to run a trait meta-analysis you don't need to install PEcAn.all, but rather just a subset of packages (and it would be good to show which subset). If you want to then run a model using those posteriors you probably will end up installing PEcAn.all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just copied this part from Demo 1 and Demo 2.fixed it now.

install.packages('PEcAn.all')
```

* **A valid `pecan.xml` configuration file**: Start with the example at `pecan/documentation/tutorials/Demo_03_Meta_Analysis/pecan.xml`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue to open: one should be able to run the MA itself without a full settings object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<?xml version="1.0" encoding="UTF-8"?>
<pecan>
 <pfts>
  <pft>
   <name>temperate.coniferous</name>
   <posterior.files>pft/temperate.coniferous</posterior.files>
   <outdir>pft/temperate.coniferous</outdir>
  </pft>
 </pfts> 
 <meta.analysis>
    <iter>3000</iter>
    <random.effects>
      <on>FALSE</on>
      <use_ghs>TRUE</use_ghs>
    </random.effects><threshold>1.2</threshold>
  </meta.analysis>
</pecan>

This configuration alone is enough to run the meta-analysis; no additional model or run blocks are required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. So I think it would be fine in the Issue I'm recommending to recommend that the MA module be refactored to take in the following as arguments:

  1. trait dataframe
  2. prior dataframe
  3. output directory
  4. list containing MA configs (iter, random.effects, use_ghs, theshold, etc.) with some sensible defaults

Here in the demo one could elect to grab those things from a settings object, but one could also build a demo based on just specifying those things.

* **SIPNET binary**: While not strictly used for the meta-analysis calculation itself, it is part of the broader workflow context.
* **Pre-generated Data**: This demo relies on `trait.data.Rdata` and `prior.distns.Rdata` files which are included in the `pft/temperate.coniferous` directory.

## Install SIPNET and Meteorological Data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was covered in Demo 1 or Demo 2 send users there rather than duplicating text. If you duplicate then there's twice as much text to keep up-to-date if anything changes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for this meta-analysis the posterior files are not needed.will remove this block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up question: in this case, the model block in the settings configuration file (pecan.xml) isn’t needed, and run$input isn’t needed either. Basically, this whole section:
Basically this part

 <model>
  <type>SIPNET</type>
  <revision>git</revision>
  <delete.raw>FALSE</delete.raw>
  <binary>demo_outdir/sipnet</binary>
 </model>
 <run>
  <site>
   <met.start>2004/01/01</met.start>
   <met.end>2004/12/31</met.end>
   <name>Niwot Ridge Forest/LTER NWT1 (US-NR1)</name>
   <lat>40.0329</lat>
   <lon>-105.546</lon>
  </site>
  <inputs>
   <met>
    <source>AmerifluxLBL</source>
    <output>SIPNET</output>
    <username>Aritra_2004</username>
    <path>
     <path1>dbfiles/AMF_US-NR1_BASE_HH_23-5.2004-01-01.2004-12-31.clim</path1>
    </path>
   </met>
  </inputs>
  <start.date>2004/01/01</start.date>
  <end.date>2004/12/31</end.date>
 </run>

Should we remove this, or keep it so that users still get a clear idea of what the settings file normally looks like(Demo 1 and demo 2 does it though)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote against keeping anything that you don't need. It's fine, in practice, to say that the settings for using these posteriors for forward model simulation is more complicated (see Demo1 and Demo2) and that in practice you can write one settings file that contains both the model run and MA settings and run it all as a single workflow. Making the settings here minimal is an asset for making the module more accessible and for ultimately moving towards dropping settings as an argument and instead passing the function just the info it needs


See Demo 1 Section 6 for details on what these functions do. Briefly, they read the XML file, convert it into an R list object that PEcAn can use, check that settings are valid, fill in defaults, and create the output directory.

## Explore the Settings Object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed except for the parts related to the MA, which you've already covered

# Run Meta Analysis

We now run the meta-analysis. The `runModule.run.meta.analysis` function will:
1. Read the `trait.data.Rdata` and `prior.distns.Rdata` from the PFT output directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue to open: would be great to be able to pass data into the MA directly, rather than it having to come from a file with a very specific file name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe this needs a separate function to implement it. We just need to pass trait.data.Rdata and prior.distns.Rdata to make it work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, can we do this within this notebook by adding a section where the user only needs to provide the paths to trait.data.Rdata and prior.distns.Rdata ? If a user already has these two files, they can run the meta-analysis directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up: I'm not requesting a change to this PR to implement what I'm suggesting. I'm instead asking that you open a new Issue to improve what we're doing in the future. Specifically, I'd recommend that the MA module take in the trait dataframe and prior dataframe as arguments to the function itself, rather than relying on the functions knowing to load those specific files from paths provided within an overly complex settings object. This will push a tiny bit of work into the demo (load the example files, look at them to see how they are formatted, pass them into the MA function) but IMHO will greatly increase the usability of the MA module as a stand-alone tool. Right now, it's functionally easy to to use the MA outside the PEcAn workflow, but its CONCEPTUALLY hard to do so because there's so much mystery in what it's doing. Right now, no one can actually run this as a stand alone module in practice without a whole lot of diving into the code to see what the module actually does and what it actually needs to work. Actually getting this working and documented might be a good place for a new GSOC student.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here #3718

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AritraDey-Dev Thanks for creating issue #3717! I've implemented the format_try_for_ma() function in PR #3720.

This connects directly to @mdietze's suggestion about using external data. Once PR #3720 (TRY formatter) and issue #3718 (MA refactoring) are both implemented, this tutorial could show how to use TRY data with PEcAn's meta-analysis.


# Visualize Meta Analysis Results {#sec-visualize}

It is important to check the MCMC chains for convergence. We can visualize the trace plots and density plots for each trait.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to provide additional explanation about what these figures are showing and how to interpret them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in eb4c2f4

# }
```

# Conclusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Notebook doesn't go on to use these posteriors in a set of model runs so there's definitely no need to include the text and code for installing the model, installing all PEcAn packages, etc.
  2. Notebook doesn't explain HOW one goes on to use these posteriors in a set of model runs. Could be as simple as "now edit your pecan.xml to point <posterior.files> to these outputs and rerun Demo 02. How did your results change in terms of the width of the CI and the uncertainty analyses?"

[Explore](https://github.com/PecanProject/pecan/blob/main/documentation/tutorials/sensitivity/PEcAn_sensitivity_tutorial_v1.0.Rmd) how model error changes as a function of parameter value (i.e. data assimilation ‘by hand’)


**MCMC Concepts**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these two modules are fairly pedagogical about calibration concepts, but neither really show how to run the PEcAn calibration or SDA code. I'd include those modules too.

AritraDey-Dev and others added 14 commits December 9, 2025 00:07
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
Signed-off-by: Aritra Dey <adey01027@gmail.com>
@AritraDey-Dev
Copy link
Member Author

@mdietze Thanks for the review and the detailed comments. I’ve tried to address and fix the suggested items in the subsequent commits.

@Mayanknishad9
Copy link

@mdietze This is a great suggestion. My format_try_for_ma() function (PR #3720 ) creates the trait_data dataframe from TRY exports that could feed directly into such a simplified MA function.

If the MA module is refactored as you suggest, users could:

  1. Use format_try_for_ma() to convert TRY data to proper format
  2. Pass that dataframe directly to a simplified run_meta_analysis() function
  3. Bypass the complex settings object entirely

This would make PEcAn's meta-analysis much more accessible to researchers with external data sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants