Data Scraper

A powerful web scraping application built with Angular and Node.js that allows users to extract structured data from any website.

Features

🎯 Smart Field Selection: Choose from 20 pre-defined fields to scrape
🔍 Auto Field Discovery: Automatically discovers additional fields from the website
🎨 Beautiful UI: Modern, responsive design with Material UI
📊 Real-time Results: View scraped data immediately
💾 CSV Export: Export scraped data to CSV format
⚡ Fast & Efficient: Built with performance in mind

Pre-configured Fields

The app comes with 20 default fields (10 auto-selected):

Address, City, Name, Full Name, Location
Description, Note, About, Contact, Email
Phone, Title, Company, Website, Street
State, Zip Code, Country, Price, Date

Tech Stack

Frontend

Angular 18 (SSR enabled)
Angular Material
TypeScript
SCSS

Backend

Node.js
Express
Cheerio (HTML parsing)
Axios (HTTP requests)
json2csv (CSV generation)

Installation

Quick Setup (All Dependencies)

npm run install:all

Backend Setup

cd backend
npm install
npm run dev

The backend server will start on http://localhost:3000

Frontend Setup

cd frontend
npm install
npm start

The frontend will start on http://localhost:4200

Usage

Enter URL: Paste the website URL you want to scrape
Select Fields: Choose which data fields to extract (10 are pre-selected)
Start Scraping: Click "Start Scraping" button
Review Results: View extracted data and discovered fields
Select Extra Fields: Choose any additional fields found
Export: Download results as CSV file

API Endpoints

POST /api/scrape/url

Scrape a website URL and extract data

Request:

{
  "url": "https://example.com",
  "selectedFields": ["name", "email", "phone"]
}

Response:

{
  "success": true,
  "url": "https://example.com",
  "pageInfo": {
    "title": "Example Page",
    "description": "...",
    "keywords": "..."
  },
  "scrapedData": {
    "name": ["John Doe"],
    "email": ["john@example.com"]
  },
  "extraFields": ["username", "bio"],
  "timestamp": "2025-11-11T..."
}

POST /api/scrape/export

Export scraped data to CSV

Request:

{
  "data": {
    "name": ["John Doe"],
    "email": ["john@example.com"]
  },
  "fields": ["name", "email"]
}

Response: CSV file download

Project Structure

data-scraper/
├── backend/
│   ├── controllers/
│   │   └── scrapeController.js
│   ├── routes/
│   │   └── scrape.js
│   ├── server.js
│   ├── package.json
│   └── README.md
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   ├── services/
│   │   │   │   └── scraper.service.ts
│   │   │   ├── app.component.ts
│   │   │   ├── app.component.html
│   │   │   ├── app.component.scss
│   │   │   └── app.config.ts
│   │   ├── styles.scss
│   │   └── index.html
│   └── package.json
└── README.md

Development

Backend Development

cd backend
npm run dev  # Runs with nodemon for auto-restart

Frontend Development

cd frontend
npm start  # Runs Angular dev server

Building for Production

Backend

cd backend
npm start

Frontend

cd frontend
npm run build

The built files will be in frontend/dist/ directory.

Deployment

Hosting Options

This project can be deployed to various platforms:

Backend Deployment

Heroku: Deploy the backend/ folder as a Node.js app
Railway: Connect your GitHub repo and set the root to backend/
Render: Deploy as a Web Service with Node.js
Vercel: Use serverless functions or deploy as Node.js app

Frontend Deployment

Vercel: Connect the repo and set build command to cd frontend && npm run build
Netlify: Set build command to cd frontend && npm run build and publish directory to frontend/dist
GitHub Pages: Build and deploy the frontend/dist/ folder
Firebase Hosting: Deploy the frontend/dist/ folder

Environment Variables

For production, make sure to set appropriate environment variables:

PORT: Backend server port (default: 3000)
NODE_ENV: Set to production for production builds

CORS Configuration

If deploying frontend and backend separately, update CORS settings in backend/server.js to allow requests from your frontend domain.

Features in Detail

Smart Field Extraction

The scraper uses multiple strategies to find data:

Searches for matching IDs, classes, and names
Looks for label-value pairs
Extracts meta tag content
Identifies semantic patterns

Field Discovery

Automatically discovers potential fields by analyzing:

HTML attributes (id, class, name)
Meta tags
Form labels
Table headers

CSV Export

Transforms scraped data into CSV format with:

Proper column headers
Multiple values per field
Clean formatting

Browser Support

Chrome (recommended)
Firefox
Safari
Edge

License

ISC

Author

Akash Pandey

📧 Hire Me - akashdeep9226@gmail.com

Contributing

Feel free to submit issues and enhancement requests!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
SETUP_INSTRUCTIONS.md		SETUP_INSTRUCTIONS.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Data Scraper

Features

Pre-configured Fields

Tech Stack

Frontend

Backend

Installation

Quick Setup (All Dependencies)

Backend Setup

Frontend Setup

Usage

API Endpoints

POST /api/scrape/url

POST /api/scrape/export

Project Structure

Development

Backend Development

Frontend Development

Building for Production

Backend

Frontend

Deployment

Hosting Options

Backend Deployment

Frontend Deployment

Environment Variables

CORS Configuration

Features in Detail

Smart Field Extraction

Field Discovery

CSV Export

Browser Support

License

Author

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages