A powerful web scraping application built with Angular and Node.js that allows users to extract structured data from any website.
- 🎯 Smart Field Selection: Choose from 20 pre-defined fields to scrape
- 🔍 Auto Field Discovery: Automatically discovers additional fields from the website
- 🎨 Beautiful UI: Modern, responsive design with Material UI
- 📊 Real-time Results: View scraped data immediately
- 💾 CSV Export: Export scraped data to CSV format
- ⚡ Fast & Efficient: Built with performance in mind
The app comes with 20 default fields (10 auto-selected):
- Address, City, Name, Full Name, Location
- Description, Note, About, Contact, Email
- Phone, Title, Company, Website, Street
- State, Zip Code, Country, Price, Date
- Angular 18 (SSR enabled)
- Angular Material
- TypeScript
- SCSS
- Node.js
- Express
- Cheerio (HTML parsing)
- Axios (HTTP requests)
- json2csv (CSV generation)
npm run install:allcd backend
npm install
npm run devThe backend server will start on http://localhost:3000
cd frontend
npm install
npm startThe frontend will start on http://localhost:4200
- Enter URL: Paste the website URL you want to scrape
- Select Fields: Choose which data fields to extract (10 are pre-selected)
- Start Scraping: Click "Start Scraping" button
- Review Results: View extracted data and discovered fields
- Select Extra Fields: Choose any additional fields found
- Export: Download results as CSV file
Scrape a website URL and extract data
Request:
{
"url": "https://example.com",
"selectedFields": ["name", "email", "phone"]
}Response:
{
"success": true,
"url": "https://example.com",
"pageInfo": {
"title": "Example Page",
"description": "...",
"keywords": "..."
},
"scrapedData": {
"name": ["John Doe"],
"email": ["john@example.com"]
},
"extraFields": ["username", "bio"],
"timestamp": "2025-11-11T..."
}Export scraped data to CSV
Request:
{
"data": {
"name": ["John Doe"],
"email": ["john@example.com"]
},
"fields": ["name", "email"]
}Response: CSV file download
data-scraper/
├── backend/
│ ├── controllers/
│ │ └── scrapeController.js
│ ├── routes/
│ │ └── scrape.js
│ ├── server.js
│ ├── package.json
│ └── README.md
├── frontend/
│ ├── src/
│ │ ├── app/
│ │ │ ├── services/
│ │ │ │ └── scraper.service.ts
│ │ │ ├── app.component.ts
│ │ │ ├── app.component.html
│ │ │ ├── app.component.scss
│ │ │ └── app.config.ts
│ │ ├── styles.scss
│ │ └── index.html
│ └── package.json
└── README.md
cd backend
npm run dev # Runs with nodemon for auto-restartcd frontend
npm start # Runs Angular dev servercd backend
npm startcd frontend
npm run buildThe built files will be in frontend/dist/ directory.
This project can be deployed to various platforms:
- Heroku: Deploy the
backend/folder as a Node.js app - Railway: Connect your GitHub repo and set the root to
backend/ - Render: Deploy as a Web Service with Node.js
- Vercel: Use serverless functions or deploy as Node.js app
- Vercel: Connect the repo and set build command to
cd frontend && npm run build - Netlify: Set build command to
cd frontend && npm run buildand publish directory tofrontend/dist - GitHub Pages: Build and deploy the
frontend/dist/folder - Firebase Hosting: Deploy the
frontend/dist/folder
For production, make sure to set appropriate environment variables:
PORT: Backend server port (default: 3000)NODE_ENV: Set toproductionfor production builds
If deploying frontend and backend separately, update CORS settings in backend/server.js to allow requests from your frontend domain.
The scraper uses multiple strategies to find data:
- Searches for matching IDs, classes, and names
- Looks for label-value pairs
- Extracts meta tag content
- Identifies semantic patterns
Automatically discovers potential fields by analyzing:
- HTML attributes (id, class, name)
- Meta tags
- Form labels
- Table headers
Transforms scraped data into CSV format with:
- Proper column headers
- Multiple values per field
- Clean formatting
- Chrome (recommended)
- Firefox
- Safari
- Edge
ISC
Akash Pandey
📧 Hire Me - akashdeep9226@gmail.com
Feel free to submit issues and enhancement requests!