Skip to content

Jahnavi314/web-scraper-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕸️ Web Scraper with Node.js, Puppeteer & Python Flask

This project demonstrates a multi-stage Docker setup where:

  • A Node.js script uses Puppeteer and Chromium to scrape the title and first <h1> of any website.
  • A Python Flask server then serves the scraped data as a JSON API.

🚀 How It Works

  1. Scraper Stage (Node.js + Puppeteer):

    • Accepts a URL from an environment variable SCRAPE_URL
    • Launches headless Chromium
    • Extracts the page title and first heading
    • Saves the result as scraped_data.json
  2. Web Server Stage (Python Flask):

    • Reads the scraped_data.json
    • Serves it on port 5000 as a JSON response

🐳 Build the Docker Image

In the root project directory (where your Dockerfile is), run:

docker build -t web-scraper .

▶️ Run the Docker Container
To run the container and scrape a URL, use:

docker run -p 5000:5000 -e SCRAPE_URL="https://www.wikipedia.org" web-scraper

You can replace the URL with any valid webpage.

This will start a Flask server that hosts the scraped output.

🌐 Access the Scraped Data
Once the container is running, open your browser and visit:

http://localhost:5000

You'll see output like:

{
  "title": "Wikipedia",
  "heading": "Wikipedia
The Free Encyclopedia"
}

🧼 Stopping the Container
If running interactively: press Ctrl + C

Or list and stop it manually:

docker ps
docker stop <container_id>

📁 Project Structure

.
├── Dockerfile
├── scrape.js
├── server.py
├── scraped_data.json (auto-generated)
├── package.json
├── requirements.txt
└── README.md

✅ Requirements Summary

  • Node.js 18-slim + Puppeteer + Chromium ✅
  • Python 3.10-slim + Flask ✅
  • Multi-stage Docker build ✅
  • Accepts dynamic input via env variable ✅
  • Serves data over HTTP as JSON ✅

✨ Done by:
Jahnavi Veliganti
DevOps Assignment | ExactSpace Technologies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors