Skip to content

scraper-bank/Amazon.com-Scrapers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Scrapers

Production-ready Amazon web scrapers for extracting structured ecommerce data from Amazon.
Includes scrapers for ASIN product pages, search results (SERP), and category pages, available in Python and Node.js with multiple framework implementations.

All scrapers are generated using the ScrapeOps AI Scraper Builder and designed for reliability at scale with built-in proxy rotation, retries, and anti-bot handling via ScrapeOps.

🧭 Scraper Coverage

Page Type / Data Python Node.js
Product Page (ASIN /dp/) ✅ Available ✅ Available
Search Results (SERP) ✅ Available ✅ Available
Category Page ✅ Available ✅ Available
Reviews ⏳ Coming soon ⏳ Coming soon
Sellers ⏳ Coming soon ⏳ Coming soon

✅ Choose Your Language & Framework

Python Implementations

Framework Best For Complexity Start Here
BeautifulSoup Simple HTML parsing, quick scripts, lightweight scraping ⭐ Low python/BeautifulSoup/README.md
Scrapy Large-scale crawling, pipelines, production workflows ⭐⭐⭐ Medium python/scrapy/README.md
Playwright JavaScript-heavy sites, dynamic content, browser automation ⭐⭐⭐ High python/playwright/README.md
Selenium Browser automation, legacy support, wide browser compatibility ⭐⭐⭐ High python/selenium/README.md

Node.js Implementations

Framework Best For Complexity Start Here
Cheerio & Axios Simple HTML parsing, quick scripts, lightweight scraping ⭐ Low node/cheerio & Axios/README.md
Playwright JavaScript-heavy sites, dynamic content, browser automation ⭐⭐⭐ High node/playwright/README.md
Puppeteer Browser automation, headless Chrome, anti-bot evasion ⭐⭐⭐ High node/puppeteer/README.md

📑 Table of Contents


📦 Available Implementations

Python Implementations

Python Amazon Scrapers — Production-ready Amazon scrapers in Python

  • BeautifulSoup — Simple HTML parsing with BeautifulSoup

  • Scrapy — Full-featured crawling framework

  • Playwright — Browser automation framework

  • Selenium — Browser automation with Selenium WebDriver

    • Product pages, search results (SERP), and category pages
    • Wide browser support and legacy system compatibility
    • python/selenium/README.md

Node.js Implementations

Node.js Amazon Scrapers — Production-ready Amazon scrapers in Node.js

  • Cheerio & Axios — Simple HTML parsing with Cheerio & Axios

  • Playwright — Browser automation framework

    • Product pages, search results (SERP), and category pages
    • JavaScript-heavy sites and dynamic content
    • node/playwright/README.md
  • Puppeteer — Browser automation with Puppeteer

    • Product pages, search results (SERP), and category pages
    • Headless Chrome and anti-bot evasion
    • node/puppeteer/README.md

Go Implementations

Go Amazon Scrapers — Production-ready Amazon scrapers in Go

  • HTTP Client — Go-based HTTP client scrapers

🚀 Quick Start

What You'll Need

  • Python 3.7+ (for Python implementations) or Node.js 14+ (for Node.js implementations)
  • ScrapeOps API key (Get one free)
  • ✅ Framework-specific dependencies (see language-specific documentation)

Setup Steps

  1. Choose your language and framework:

  2. Install dependencies:

    Python (BeautifulSoup):

    pip install requests beautifulsoup4 lxml

    Python (Playwright):

    pip install playwright playwright-stealth
    playwright install chromium

    Python (Selenium):

    pip install undetected-chromedriver seleniumwire selenium

    Node.js (Cheerio & Axios):

    npm install axios cheerio he

    Node.js (Playwright):

    npm install playwright-extra puppeteer-extra-plugin-stealth cheerio
    npx playwright install chromium

    Node.js (Puppeteer):

    npm install puppeteer-extra puppeteer-extra-plugin-stealth cheerio
  3. Get your ScrapeOps API key:

    • Sign up at ScrapeOps (free account)
    • Copy your API key from the dashboard
    • Add it to your scraper (see framework-specific documentation)
  4. Run a scraper:

    • Navigate to the framework directory (e.g., python/BeautifulSoup/product/product_data/ or node/cheerio & Axios/product/product_data/)
    • Follow the framework-specific README for detailed instructions

👉 Start with python/BeautifulSoup/README.md or node/cheerio & Axios/README.md for the most complete documentation and ready-to-use scrapers.


🛠️ Language & Framework Comparison

Python vs. Node.js

Language Best For Frameworks Available
Python Data science, ML pipelines, large-scale crawling BeautifulSoup, Scrapy, Playwright, Selenium
Node.js Web development, async operations, JavaScript ecosystem Cheerio & Axios, Playwright, Puppeteer

Framework Comparison

Lightweight HTML Parsing

Framework Language Complexity Best For
BeautifulSoup Python ⭐ Low Simple HTML parsing, quick scripts
Cheerio & Axios Node.js ⭐ Low Simple HTML parsing, quick scripts

Browser Automation

Framework Language Complexity Best For
Playwright Python/Node.js ⭐⭐⭐ High JavaScript-heavy sites, modern async support
Puppeteer Node.js ⭐⭐⭐ High Headless Chrome, anti-bot evasion
Selenium Python ⭐⭐⭐ High Wide browser support, legacy systems

Large-Scale Crawling

Framework Language Complexity Best For
Scrapy Python ⭐⭐⭐ Medium Production pipelines, large-scale crawling

📋 Common Use Cases

  • Amazon price monitoring — Track product prices over time
  • Product catalog ingestion — Build comprehensive product databases
  • Competitive pricing analysis — Compare prices across products
  • Review and rating aggregation — Collect customer feedback data
  • Search results analysis (SERP) — Analyze search result rankings
  • Category hierarchy mapping — Map Amazon's category structure
  • Market research — Gather product and pricing intelligence
  • Ecommerce data pipelines — Build automated data collection systems

🔑 Get ScrapeOps API Key

All Amazon scrapers require a ScrapeOps API key to access the proxy service that handles Amazon's anti-bot protection.

Register for Free Account

  1. Visit the ScrapeOps registration page
  2. Sign up for a free account
  3. Navigate to your dashboard to retrieve your API key

Add API Key to Your Code

Method 1: Direct Assignment (Quick Start)

  1. Open the scraper file you want to use
  2. Locate the API_KEY variable near the top of the file
  3. Replace the placeholder with your actual ScrapeOps API key:
    # Python
    API_KEY = "your-actual-api-key-here"
    // Node.js
    const API_KEY = "your-actual-api-key-here";

Method 2: Environment Variable (Recommended for Production)

  1. Set the environment variable:

    # macOS/Linux
    export SCRAPEOPS_API_KEY="your-actual-api-key-here"
    
    # Windows PowerShell
    $env:SCRAPEOPS_API_KEY="your-actual-api-key-here"
  2. Modify the code to read from environment:

    # Python
    import os
    API_KEY = os.getenv("SCRAPEOPS_API_KEY", "your-default-key")
    // Node.js
    const API_KEY = process.env.SCRAPEOPS_API_KEY || "your-default-key";

Note: Some v1 scripts read from a hardcoded API_KEY constant. If so, either edit API_KEY directly or update the script to use environment variables.


⚙️ Requirements

Python Requirements

  • Python Version: 3.7 or higher
  • Framework-Specific Dependencies: See python/README.md for detailed dependency information

Node.js Requirements

  • Node.js Version: 14 or higher
  • Framework-Specific Dependencies: See node/README.md for detailed dependency information

Go Requirements


📚 Documentation

Python Documentation

Python Amazon Scrapers — Complete guide for Python implementations

Node.js Documentation

Node.js Amazon Scrapers — Complete guide for Node.js implementations

Go Documentation

Go Amazon Scrapers — Complete guide for Go implementations


🔄 Why These Scrapers Exist

  • Generated automatically from real Amazon URLs using AI
  • Designed to survive common Amazon anti-bot defenses
  • Intended as reference-quality scrapers, not demos or proof of concepts
  • Available in multiple languages and frameworks to suit different use cases
  • Built with production-ready patterns and best practices
  • Integrated with ScrapeOps for reliable proxy rotation and anti-bot handling

How These Scrapers Were Generated

All scrapers in this repository were generated automatically using the ScrapeOps AI Scraper Builder:

  • The AI analyzed real Amazon pages
  • Identified stable data locations and structures
  • Mapped structured JSON schemas
  • Generated production-ready scraping code
  • Integrated proxy handling and retries automatically

This repo represents the output of the system, not hand-written scraping logic.

Get started: AI Scraper Builder


📊 Repository Structure

Amazon-Scrapers-master/
├── python/
│   ├── BeautifulSoup/          # Simple HTML parsing
│   ├── scrapy/                 # Full-featured crawling framework
│   ├── playwright/             # Browser automation (Python)
│   ├── selenium/               # Selenium WebDriver
│   └── README.md               # Python overview
├── node/
│   ├── cheerio & Axios/        # Simple HTML parsing
│   ├── playwright/             # Browser automation (Node.js)
│   ├── puppeteer/              # Puppeteer browser automation
│   └── README.md               # Node.js overview
├── go/
│   ├── http/                   # Go HTTP client
│   └── README.md               # Go overview
├── README.md                   # This file (main overview)
└── LICENSE                     # License information

🛡️ Anti-Bot Protection

All scrapers in this repository use ScrapeOps proxy service to automatically handle Amazon's anti-bot protection:

  • Automatic Proxy Rotation — IP rotation to avoid detection
  • Header Management — Browser fingerprinting and header rotation
  • Rate Limiting — Built-in rate limiting and request throttling
  • CAPTCHA Solving — Automatic CAPTCHA solving when needed
  • Request Timing Optimization — Intelligent request timing

No additional anti-bot configuration is required beyond setting your ScrapeOps API key.


📚 Additional Resources

Generate Your Own Scraper with AI

Use the ScrapeOps AI Scraper Builder to generate production-ready scrapers from example URLs.

  • Any ecommerce website
  • Multiple languages and frameworks
  • Proxy integration included
  • Free beta available

Get started: AI Scraper Builder

Browse Existing Scrapers

Check our Scraper Bank for pre-built scrapers across various websites and tech stacks.

Browse scrapers: Scraper Bank

ScrapeOps Resources


Legal Notice

These scrapers are provided for educational and research purposes.
You are responsible for ensuring your use complies with Amazon's terms of service and applicable laws in your jurisdiction.


License

See LICENSE file for license information.


About

Production-ready Amazon scrapers for extracting structured ecommerce data from ASIN product pages (/dp/), search results (SERP), and category pages, with multiple implementations across Python and Node.js (and additional language support depending on the repo). Generated via the ScrapeOps AI Scraper Builder and designed for scale.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors