We develop a genome-wide rare variant association test designed for identifying trait-associated loci and functional annotations. This repository accompanies our recent preprint: Leveraging functional annotations to map rare variants associated with Alzheimer’s disease with gruyere.
gruyere is written in Python. You can load gruyere along with required dependencies with the following:
git clone https://github.com/daklab/gruyere.git
cd gruyere
pip install -r requirements.txt OR conda create --name gruyere --file requirements.txtModel Inputs
G: Genotypes for N individuals and P variants [P x N]. Index should contain gene name that variant maps to. Can optionally include variant id "gene_variantID"Z: Functional annotations for P variants and Q annotations [P x Q]. Index should contain gene name that variant maps to. Can optionally include variant id "gene_variantID"XY: Individual-level covariates for N individuals and C covariates [NxC] and "Diagnosis" column for binary or continuous phenotypes
Model Outputs (Joint analysis)
alpha.csv: Learned covariate weights by genetau.csv: Learned genome-wide annotation weightswg.csv: Learned gene weights (mean and standard deviation)losses.txt: Loss per epochtrain_performance.csv: AUC and accuracy of predictions by gene on training settest_performance.csv: AUC and accuracy of predictions by gene on held-out test set (optional; if using test set)
Model Outputs (Per-gene analysis)
pvals_chr{chromosome}.csv: gene p-values and coefficients for all genes in chromosomepreds_chr{chromosome}.csv: individual-level predictions for all genes in chromosome
example_data/inputs.yamlcontains example inputs:
---
output: '../example_outputs/' # Path where outputs are saved
XY: '../example_data/XY.csv' # File with covariates (X) and phenotypes (Y)
G: '../example_data/genotypes/' # Path to genotypes, per chromosome
Z: '../example_data/annotations/' # Path to annotations, per chromosome
epochs: 300
n_samples: 50 # Number of times to sample the posterior to determine mean/standard deviation estimates
test_prop: 0.2 # Test set proportion
lr: 0.1
genes: '../example_data/joint_analysis_genes.txt' # List of genes to perform joint analysis on (we use FST-significant genes)
simulate: False
Scripts
models.py: contains gruyere model classdata_class: processes input data and stores as dataclass objectload_data: functions to load input datautils.py: utility functionsperformance.py: calculates AUROC and accuracy on gruyere predictionsgruyere_joint.py: fits joint gruyere modelgruyere_pergene.py: fits per-gene gruyere regression
Run gruyere
python src/gruyere_joint.py example_data/inputs.yaml
python src/gruyere_pergene.py example_data/inputs.yaml $CHR # For each chromosome
