An AI powered web application that helps students and researchers understand research papers instantly using Google Gemini LLM and RAG architecture.
https://scholar-research-ai.streamlit.app
Reading and understanding research papers is one of the most time consuming tasks for students and researchers. A single paper can take hours to fully comprehend.
ScholarAI solves this by:
- Instantly summarizing any research paper
- Extracting key insights and findings
- Generating proper academic citations
- Answering any question about the paper in natural language
| Feature | Description |
|---|---|
| ๐ Paper Summary | Automatically generates a comprehensive summary covering objective, methodology, findings and conclusion |
| ๐ Key Insights | Extracts problem statement, proposed solution, dataset, results, limitations and future work |
| ๐ Citation Generation | Generates proper academic citations in APA, MLA and Chicago formats |
| ๐ฌ Q&A Chat | Ask any question about the paper and get accurate context aware answers with conversation memory |
ScholarAI is built on RAG (Retrieval Augmented Generation) architecture:
User uploads PDF
โ
PyPDF2 extracts text from PDF
โ
LangChain splits text into chunks
โ
HuggingFace converts chunks to embeddings
โ
FAISS stores embeddings in vector database
โ
User asks question / requests summary
โ
FAISS retrieves most relevant chunks
โ
Google Gemini LLM generates accurate answer
โ
Result displayed in Streamlit UI
| Tool | Purpose |
|---|---|
| Streamlit | Web application UI |
| Google Gemini LLM | Core language model for understanding and generating answers |
| LangChain | RAG pipeline orchestration |
| FAISS | Vector database for storing and searching embeddings |
| HuggingFace | Sentence embeddings model (all-MiniLM-L6-v2) |
| PyPDF2 | PDF text extraction |
| Python | Core programming language |
ScholarAI/
โโโ app.py โ Main Streamlit web application
โโโ pdf_processor.py โ PDF text extraction module
โโโ rag_pipeline.py โ RAG pipeline and vector database
โโโ gemini_handler.py โ Google Gemini LLM integration
โโโ requirements.txt โ Project dependencies
โโโ .gitignore โ Git ignore file
โโโ README.md โ Project documentation
git clone https://github.com/codeByShan/ScholarAI.git
cd ScholarAIpython -m venv venv
venv\Scripts\activate # Windowspip install -r requirements.txtCreate a .env file in the root directory:
GEMINI_API_KEY=your_gemini_api_key_here
Get your free Gemini API key from Google AI Studio
streamlit run app.pyThis app is deployed on Streamlit Cloud for free.
To deploy your own version:
- Fork this repository
- Go to streamlit.io/cloud
- Connect your GitHub repository
- Add
GEMINI_API_KEYin Streamlit Secrets - Deploy!
Upload any research paper PDF to get started.

Get a comprehensive summary covering all key aspects of the paper.

Extract structured insights including problem, solution, dataset and results.

Generate proper citations in APA, MLA and Chicago formats instantly.

Ask any question about the paper and get accurate answers with conversation memory.

- Students โ Understand research papers quickly without reading every page
- Researchers โ Extract key findings and insights efficiently
- Academics โ Generate proper citations automatically
- Professionals โ Stay updated with latest research in your field
- Works best with text based PDFs (not scanned images)
- Gemini free tier has rate limits โ may show busy message during peak hours
- Summary is based on first 5000 characters of the paper
- Q&A answers are limited to content within the uploaded paper
- Support for multiple PDF uploads simultaneously
- Search across 200M+ academic papers (like real ScholarAI)
- Export summary and insights as Word/PDF document
- Multilingual support for Urdu and other languages
- Study guide and flashcard generation
Zeeshan Ali (codeByShan)
Aspiring AI Engineer
Built as Final Year Project for AI Bootcamp
This project is open source and available under the MIT License.
Built with โค๏ธ using Google Gemini, LangChain and Streamlit