Skip to content

codeByShan/ScholarAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ฌ ScholarAI โ€” Intelligent Research Paper Assistant

An AI powered web application that helps students and researchers understand research papers instantly using Google Gemini LLM and RAG architecture.


๐Ÿš€ Live Demo

https://scholar-research-ai.streamlit.app


๐Ÿ“Œ What is ScholarAI?

Reading and understanding research papers is one of the most time consuming tasks for students and researchers. A single paper can take hours to fully comprehend.

ScholarAI solves this by:

  • Instantly summarizing any research paper
  • Extracting key insights and findings
  • Generating proper academic citations
  • Answering any question about the paper in natural language

โœจ Features

Feature Description
๐Ÿ“„ Paper Summary Automatically generates a comprehensive summary covering objective, methodology, findings and conclusion
๐Ÿ” Key Insights Extracts problem statement, proposed solution, dataset, results, limitations and future work
๐Ÿ“š Citation Generation Generates proper academic citations in APA, MLA and Chicago formats
๐Ÿ’ฌ Q&A Chat Ask any question about the paper and get accurate context aware answers with conversation memory

๐Ÿง  How It Works

ScholarAI is built on RAG (Retrieval Augmented Generation) architecture:

User uploads PDF
        โ†“
PyPDF2 extracts text from PDF
        โ†“
LangChain splits text into chunks
        โ†“
HuggingFace converts chunks to embeddings
        โ†“
FAISS stores embeddings in vector database
        โ†“
User asks question / requests summary
        โ†“
FAISS retrieves most relevant chunks
        โ†“
Google Gemini LLM generates accurate answer
        โ†“
Result displayed in Streamlit UI

๐Ÿ› ๏ธ Tech Stack

Tool Purpose
Streamlit Web application UI
Google Gemini LLM Core language model for understanding and generating answers
LangChain RAG pipeline orchestration
FAISS Vector database for storing and searching embeddings
HuggingFace Sentence embeddings model (all-MiniLM-L6-v2)
PyPDF2 PDF text extraction
Python Core programming language

๐Ÿ“ Project Structure

ScholarAI/
    โ”œโ”€โ”€ app.py              โ† Main Streamlit web application
    โ”œโ”€โ”€ pdf_processor.py    โ† PDF text extraction module
    โ”œโ”€โ”€ rag_pipeline.py     โ† RAG pipeline and vector database
    โ”œโ”€โ”€ gemini_handler.py   โ† Google Gemini LLM integration
    โ”œโ”€โ”€ requirements.txt    โ† Project dependencies
    โ”œโ”€โ”€ .gitignore          โ† Git ignore file
    โ””โ”€โ”€ README.md           โ† Project documentation

โš™๏ธ Installation and Setup

1. Clone the repository

git clone https://github.com/codeByShan/ScholarAI.git
cd ScholarAI

2. Create virtual environment

python -m venv venv
venv\Scripts\activate  # Windows

3. Install dependencies

pip install -r requirements.txt

4. Set up API key

Create a .env file in the root directory:

GEMINI_API_KEY=your_gemini_api_key_here

Get your free Gemini API key from Google AI Studio

5. Run the app

streamlit run app.py

๐ŸŒ Deployment

This app is deployed on Streamlit Cloud for free.

To deploy your own version:

  1. Fork this repository
  2. Go to streamlit.io/cloud
  3. Connect your GitHub repository
  4. Add GEMINI_API_KEY in Streamlit Secrets
  5. Deploy!

๐Ÿ“ธ Screenshots

Home Screen

Upload any research paper PDF to get started. image

Paper Summary

Get a comprehensive summary covering all key aspects of the paper. image

Key Insights

Extract structured insights including problem, solution, dataset and results. image

Academic Citations

Generate proper citations in APA, MLA and Chicago formats instantly. image

Q&A Chat

Ask any question about the paper and get accurate answers with conversation memory. image


๐ŸŽฏ Use Cases

  • Students โ€” Understand research papers quickly without reading every page
  • Researchers โ€” Extract key findings and insights efficiently
  • Academics โ€” Generate proper citations automatically
  • Professionals โ€” Stay updated with latest research in your field

โš ๏ธ Limitations

  • Works best with text based PDFs (not scanned images)
  • Gemini free tier has rate limits โ€” may show busy message during peak hours
  • Summary is based on first 5000 characters of the paper
  • Q&A answers are limited to content within the uploaded paper

๐Ÿ”ฎ Future Improvements

  • Support for multiple PDF uploads simultaneously
  • Search across 200M+ academic papers (like real ScholarAI)
  • Export summary and insights as Word/PDF document
  • Multilingual support for Urdu and other languages
  • Study guide and flashcard generation

๐Ÿ‘จโ€๐Ÿ’ป Developer

Zeeshan Ali (codeByShan)

Aspiring AI Engineer

Built as Final Year Project for AI Bootcamp


๐Ÿ“„ License

This project is open source and available under the MIT License.


Built with โค๏ธ using Google Gemini, LangChain and Streamlit

About

AI powered research paper assistant built with Google Gemini LLM and RAG architecture

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages