Genome-wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) associated with complex diseases such as asthma. However, not all SNPs are functionally relevant. This project implements a reproducible pipeline to filter, annotate, and prioritize asthma-associated SNPs. Asthma is a genetically and biologically complex disease. Identifying functional variants and druggable gene targets is key for translational impact. This pipeline implements a multi-omics, systems biology approach to prioritize GWAS signals using:
- Multi-mapping strategies (positional, eQTL, chromatin)
- Pathway and tissue-enrichment analysis
- Integration with Open Target and Pharos database for clinical & drug candidacy
The workflow integrates Python (Jupyter Notebooks), R (biomaRt and visualization), and online bioinformatics resources (NHGRI GWAS Catalog, FUMA and Pharos) to identify biologically meaningful variants. The ultimate goal is to highlight candidate SNPs and genes with potential roles in asthma pathogenesis, providing a foundation for downstream functional studies and personalized medicine approaches.
- Languages: Python (Jupyter Notebook) and R
- Databases & Resources: NHGRI GWAS Catalog, FUMA GWAS, Open Target and Pharos
- Libraries:
Python: (
Pandas,osandcsv) R: (biomaRt,ggplot2,ggthemesreadxl,tidyr,data.table,dplyr,stringrandigraph) - Reproducibility: RMarkdown for reporting and GitHub for version control
-
NHGRI GWAS Catalog: GCST010042 (Han Y. et al.), containing asthma-associated SNPs and their metadata.
-
Additional data integration from:
- FUMA GWAS for regulatory annotation and deleteriousness prediction
- Pharos for scoring and prioritising eQTL genes
- Clone this repository:
git clone <https://github.com/Naila-Srivastava/GWAS-SNPs-Annotation-Prioritization>cd GWAS-SNPs-Annotation-Prioritization - Install dependencies:
pip install -r requirements.txt - Run analysis
- Generate final report
- Automated SNP filtering by p-value and trait relevance.
- Functional annotation using Ensembl BioMart.
- Prediction of regulatory effects using FUMA.
- eQTL mapping for gene expression association.
- Scoring & prioritization of SNPs integrating multiple evidence sources.
- Multi-level visualization: Manhattan plots, graphs and networks.
- Manhattan plot for genome-wide SNP significance
- Network graph showing top genes relationships
- Minor Allele Frequency Distribution graph
- SNPs Consequences barplot
- Top genes associated with asthma
- Genes associated with the most frequent KEGG Pathways
- IL13, IL4, IL4R, IL2RA, ORMDL3, GSDMB, ZPBP2, IKZF3, KIF3A, SMAD3, TLR1, RORA, RUNX3, LRRC32, C11orf30/EMSY, RAD50, TNFSF4 etc. have been found to have strong associations with Asthma.
- Majority of the genes are enriched in pathways, like- Systemic Lupus Erythematosus, JAK-STAT Signalling Pathway and Hematopoietic cell lineage.
- The selected SNPs are 'intronic' and 'intergenic' in nature.
- Successfully implemented a bioinformatics pipeline to prioritize asthma SNPs.
- Identified candidate SNPs with high regulatory potential and disease association.
- Integrated multiple datasets for a systems-level perspective of asthma genetics.
- Established a reproducible framework for GWAS SNP annotation and visualization.
- Extend prioritization to other immune-related traits (e.g., atopy, allergic rhinitis).
- Incorporate machine learning models for SNP classification.
- Expand to multi-omics integration (epigenetics, transcriptomics).
- Publish the pipeline as a ready-to-use workflow (Nextflow/Snakemake).
- NHGRI GWAS Catalog: https://www.ebi.ac.uk/gwas/ [Han Y. et al. (GCST010042)]
- FUMA GWAS: https://fuma.ctglab.nl/
- Open Target: https://www.opentargets.org/
- Pharos: https://pharos.nih.gov/
Asthma pGWAS SNP Prioritization and Interpretation/
│
├── README.md # You're reading this now
├── requirements.txt # Python dependencies
│
├── GWAS_data_cleaning_&_preprocessing.ipynb # Jupyter notebook (Data cleaning & preprocessing)
├── R/ # R scripts
├── Results/ # Processed files and tables
├── Visuals/ # Processed figures and plots
└── data/ # Input datasets