Lightweight local chatbot using Ollama for CPU-only setups. This repository contains a FastAPI backend (backend/) and a Next.js frontend (frontend/).
Author: Akash Pandey akashdeep9226@gmail.com
- qwen3:0.6b — small, efficient, and well-suited for CPU-only use with Ollama. It provides a good balance of latency and quality for local development without a GPU. (You can change the model in
backend/app/model.py.)
-
Backend (Python + FastAPI)
- Create a virtual environment and install dependencies:
python -m venv backend/venv backend\venv\Scripts\activate pip install -r backend/requirements.txt
- Start Ollama (ensure you have the desired model pulled):
ollama serve ollama pull qwen3:0.6b
- Run the backend:
uvicorn app.main:app --reload --port 8000
-
Frontend (Next.js)
- Install and run the frontend:
cd frontend npm install npm run dev- Open
http://localhost:3000in your browser.
- Expect higher latency compared to GPU inference. Use smaller models (like
qwen3:0.6b) and lowernum_predict/max_lengthto reduce response time. - If you need faster responses, consider running Ollama on a machine with a GPU or using a hosted API.
git init
git add .
git commit -m "Initial import: backend + frontend"
gh repo create <your-username>/<repo-name> --public --source=. --remote=origin
git push -u origin mainReplace <your-username>/<repo-name> with your GitHub repo. If you don't have gh, create the repo on GitHub and then add the remote:
git remote add origin https://github.com/<your-username>/<repo-name>.git
git branch -M main
git push -u origin main- backend/app/model.py — Ollama integration and model selection
- backend/app/main.py — FastAPI server
- frontend/app/chat/page.tsx — chat UI
If you want, I can also add a GitHub Actions workflow that builds/tests the frontend and backend before pushing. Want that?