Skip to content

This repository contains a fully reproducible pipeline for prioritising candidate genes and pathways from GWAS summary statistics using integrative pGWAS functional annotation to identify the most compelling targets for asthma

License

Notifications You must be signed in to change notification settings

Naila-Srivastava/GWAS-SNPs-Annotation-Prioritization

Repository files navigation

GWAS SNPs Annotation, Prioritization and Interpretation associated with Asthma

Overview

Genome-wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) associated with complex diseases such as asthma. However, not all SNPs are functionally relevant. This project implements a reproducible pipeline to filter, annotate, and prioritize asthma-associated SNPs. Asthma is a genetically and biologically complex disease. Identifying functional variants and druggable gene targets is key for translational impact. This pipeline implements a multi-omics, systems biology approach to prioritize GWAS signals using:

  • Multi-mapping strategies (positional, eQTL, chromatin)
  • Pathway and tissue-enrichment analysis
  • Integration with Open Target and Pharos database for clinical & drug candidacy

The workflow integrates Python (Jupyter Notebooks), R (biomaRt and visualization), and online bioinformatics resources (NHGRI GWAS Catalog, FUMA and Pharos) to identify biologically meaningful variants. The ultimate goal is to highlight candidate SNPs and genes with potential roles in asthma pathogenesis, providing a foundation for downstream functional studies and personalized medicine approaches.

Tools & Technologies

  • Languages: Python (Jupyter Notebook) and R
  • Databases & Resources: NHGRI GWAS Catalog, FUMA GWAS, Open Target and Pharos
  • Libraries: Python: (Pandas, os and csv) R: (biomaRt, ggplot2, ggthemes readxl, tidyr, data.table, dplyr, stringr and igraph)
  • Reproducibility: RMarkdown for reporting and GitHub for version control

Data Source

  • NHGRI GWAS Catalog: GCST010042 (Han Y. et al.), containing asthma-associated SNPs and their metadata.

  • Additional data integration from:

    • FUMA GWAS for regulatory annotation and deleteriousness prediction
    • Pharos for scoring and prioritising eQTL genes

How to Run

  1. Clone this repository: git clone <https://github.com/Naila-Srivastava/GWAS-SNPs-Annotation-Prioritization> cd GWAS-SNPs-Annotation-Prioritization
  2. Install dependencies: pip install -r requirements.txt
  3. Run analysis
  4. Generate final report

Methodology

image

Features

  • Automated SNP filtering by p-value and trait relevance.
  • Functional annotation using Ensembl BioMart.
  • Prediction of regulatory effects using FUMA.
  • eQTL mapping for gene expression association.
  • Scoring & prioritization of SNPs integrating multiple evidence sources.
  • Multi-level visualization: Manhattan plots, graphs and networks.

Visualizations

  • Manhattan plot for genome-wide SNP significance
  • Network graph showing top genes relationships
  • Minor Allele Frequency Distribution graph
  • SNPs Consequences barplot
  • Top genes associated with asthma
  • Genes associated with the most frequent KEGG Pathways

Results

  1. IL13, IL4, IL4R, IL2RA, ORMDL3, GSDMB, ZPBP2, IKZF3, KIF3A, SMAD3, TLR1, RORA, RUNX3, LRRC32, C11orf30/EMSY, RAD50, TNFSF4 etc. have been found to have strong associations with Asthma.
  2. Majority of the genes are enriched in pathways, like- Systemic Lupus Erythematosus, JAK-STAT Signalling Pathway and Hematopoietic cell lineage.
  3. The selected SNPs are 'intronic' and 'intergenic' in nature.

Key Takeaways

  • Successfully implemented a bioinformatics pipeline to prioritize asthma SNPs.
  • Identified candidate SNPs with high regulatory potential and disease association.
  • Integrated multiple datasets for a systems-level perspective of asthma genetics.
  • Established a reproducible framework for GWAS SNP annotation and visualization.

What’s Next?

  • Extend prioritization to other immune-related traits (e.g., atopy, allergic rhinitis).
  • Incorporate machine learning models for SNP classification.
  • Expand to multi-omics integration (epigenetics, transcriptomics).
  • Publish the pipeline as a ready-to-use workflow (Nextflow/Snakemake).

References

  1. NHGRI GWAS Catalog: https://www.ebi.ac.uk/gwas/ [Han Y. et al. (GCST010042)]
  2. FUMA GWAS: https://fuma.ctglab.nl/
  3. Open Target: https://www.opentargets.org/
  4. Pharos: https://pharos.nih.gov/

Project Structure

Asthma pGWAS SNP Prioritization and Interpretation/  
│
├── README.md                                                  # You're reading this now   
├── requirements.txt                                           # Python dependencies   
│
├── GWAS_data_cleaning_&_preprocessing.ipynb                   # Jupyter notebook (Data cleaning & preprocessing)  
├── R/                                                         # R scripts     
├── Results/                                                   # Processed files and tables
├── Visuals/                                                   # Processed figures and plots  
└── data/                                                      # Input datasets  

About

This repository contains a fully reproducible pipeline for prioritising candidate genes and pathways from GWAS summary statistics using integrative pGWAS functional annotation to identify the most compelling targets for asthma

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages