Amazon Scrapers

Production-ready Amazon web scrapers for extracting structured ecommerce data from Amazon.
Includes scrapers for ASIN product pages, search results (SERP), and category pages, available in Python and Node.js with multiple framework implementations.

All scrapers are generated using the ScrapeOps AI Scraper Builder and designed for reliability at scale with built-in proxy rotation, retries, and anti-bot handling via ScrapeOps.

🧭 Scraper Coverage

Page Type / Data	Python	Node.js
Product Page (ASIN `/dp/`)	✅ Available	✅ Available
Search Results (SERP)	✅ Available	✅ Available
Category Page	✅ Available	✅ Available
Reviews	⏳ Coming soon	⏳ Coming soon
Sellers	⏳ Coming soon	⏳ Coming soon

✅ Choose Your Language & Framework

Python Implementations

Framework	Best For	Complexity	Start Here
BeautifulSoup	Simple HTML parsing, quick scripts, lightweight scraping	⭐ Low	`python/BeautifulSoup/README.md`
Scrapy	Large-scale crawling, pipelines, production workflows	⭐⭐⭐ Medium	`python/scrapy/README.md`
Playwright	JavaScript-heavy sites, dynamic content, browser automation	⭐⭐⭐ High	`python/playwright/README.md`
Selenium	Browser automation, legacy support, wide browser compatibility	⭐⭐⭐ High	`python/selenium/README.md`

Node.js Implementations

Framework	Best For	Complexity	Start Here
Cheerio & Axios	Simple HTML parsing, quick scripts, lightweight scraping	⭐ Low	`node/cheerio & Axios/README.md`
Playwright	JavaScript-heavy sites, dynamic content, browser automation	⭐⭐⭐ High	`node/playwright/README.md`
Puppeteer	Browser automation, headless Chrome, anti-bot evasion	⭐⭐⭐ High	`node/puppeteer/README.md`

📦 Available Implementations

Python Implementations

Python Amazon Scrapers — Production-ready Amazon scrapers in Python

BeautifulSoup — Simple HTML parsing with BeautifulSoup
- Product pages, search results (SERP), and category pages
- Fast and lightweight, ideal for static HTML content
- python/BeautifulSoup/README.md
Scrapy — Full-featured crawling framework
- Product page extraction
- Large-scale crawling and production pipelines
- python/scrapy/README.md
Playwright — Browser automation framework
- Product pages, search results (SERP), and category pages
- JavaScript-heavy sites and dynamic content
- python/playwright/README.md
Selenium — Browser automation with Selenium WebDriver
- Product pages, search results (SERP), and category pages
- Wide browser support and legacy system compatibility
- python/selenium/README.md

Node.js Implementations

Node.js Amazon Scrapers — Production-ready Amazon scrapers in Node.js

Cheerio & Axios — Simple HTML parsing with Cheerio & Axios
- Product pages, search results (SERP), and category pages
- Fast and lightweight, ideal for static HTML content
- node/cheerio & Axios/README.md
Playwright — Browser automation framework
- Product pages, search results (SERP), and category pages
- JavaScript-heavy sites and dynamic content
- node/playwright/README.md
Puppeteer — Browser automation with Puppeteer
- Product pages, search results (SERP), and category pages
- Headless Chrome and anti-bot evasion
- node/puppeteer/README.md

Go Implementations

Go Amazon Scrapers — Production-ready Amazon scrapers in Go

HTTP Client — Go-based HTTP client scrapers
- Product page extraction
- go/http/README.md

🚀 Quick Start

What You'll Need

✅ Python 3.7+ (for Python implementations) or Node.js 14+ (for Node.js implementations)
✅ ScrapeOps API key (Get one free)
✅ Framework-specific dependencies (see language-specific documentation)

Setup Steps

Choose your language and framework:
- Python → python/README.md — Python implementations overview
- Node.js → node/README.md — Node.js implementations overview
- Go → go/README.md — Go implementations overview

Install dependencies:

Python (BeautifulSoup):

pip install requests beautifulsoup4 lxml

Python (Playwright):

pip install playwright playwright-stealth
playwright install chromium

Python (Selenium):

pip install undetected-chromedriver seleniumwire selenium

Node.js (Cheerio & Axios):

npm install axios cheerio he

Node.js (Playwright):

npm install playwright-extra puppeteer-extra-plugin-stealth cheerio
npx playwright install chromium

Node.js (Puppeteer):

npm install puppeteer-extra puppeteer-extra-plugin-stealth cheerio

Get your ScrapeOps API key:
- Sign up at ScrapeOps (free account)
- Copy your API key from the dashboard
- Add it to your scraper (see framework-specific documentation)
Run a scraper:
- Navigate to the framework directory (e.g., python/BeautifulSoup/product/product_data/ or node/cheerio & Axios/product/product_data/)
- Follow the framework-specific README for detailed instructions

👉 Start with python/BeautifulSoup/README.md or node/cheerio & Axios/README.md for the most complete documentation and ready-to-use scrapers.

🛠️ Language & Framework Comparison

Python vs. Node.js

Language	Best For	Frameworks Available
Python	Data science, ML pipelines, large-scale crawling	BeautifulSoup, Scrapy, Playwright, Selenium
Node.js	Web development, async operations, JavaScript ecosystem	Cheerio & Axios, Playwright, Puppeteer

Framework Comparison

Lightweight HTML Parsing

Framework	Language	Complexity	Best For
BeautifulSoup	Python	⭐ Low	Simple HTML parsing, quick scripts
Cheerio & Axios	Node.js	⭐ Low	Simple HTML parsing, quick scripts

Browser Automation

Framework	Language	Complexity	Best For
Playwright	Python/Node.js	⭐⭐⭐ High	JavaScript-heavy sites, modern async support
Puppeteer	Node.js	⭐⭐⭐ High	Headless Chrome, anti-bot evasion
Selenium	Python	⭐⭐⭐ High	Wide browser support, legacy systems

Large-Scale Crawling

Framework	Language	Complexity	Best For
Scrapy	Python	⭐⭐⭐ Medium	Production pipelines, large-scale crawling

📋 Common Use Cases

Amazon price monitoring — Track product prices over time
Product catalog ingestion — Build comprehensive product databases
Competitive pricing analysis — Compare prices across products
Review and rating aggregation — Collect customer feedback data
Search results analysis (SERP) — Analyze search result rankings
Category hierarchy mapping — Map Amazon's category structure
Market research — Gather product and pricing intelligence
Ecommerce data pipelines — Build automated data collection systems

🔑 Get ScrapeOps API Key

All Amazon scrapers require a ScrapeOps API key to access the proxy service that handles Amazon's anti-bot protection.

Register for Free Account

Visit the ScrapeOps registration page
Sign up for a free account
Navigate to your dashboard to retrieve your API key

Add API Key to Your Code

Method 1: Direct Assignment (Quick Start)

Open the scraper file you want to use
Locate the API_KEY variable near the top of the file

Replace the placeholder with your actual ScrapeOps API key:

# Python
API_KEY = "your-actual-api-key-here"

// Node.js
const API_KEY = "your-actual-api-key-here";

Method 2: Environment Variable (Recommended for Production)

Set the environment variable:

# macOS/Linux
export SCRAPEOPS_API_KEY="your-actual-api-key-here"

# Windows PowerShell
$env:SCRAPEOPS_API_KEY="your-actual-api-key-here"

Modify the code to read from environment:

# Python
import os
API_KEY = os.getenv("SCRAPEOPS_API_KEY", "your-default-key")

// Node.js
const API_KEY = process.env.SCRAPEOPS_API_KEY || "your-default-key";

Note: Some v1 scripts read from a hardcoded API_KEY constant. If so, either edit API_KEY directly or update the script to use environment variables.

⚙️ Requirements

Python Requirements

Python Version: 3.7 or higher
Framework-Specific Dependencies: See python/README.md for detailed dependency information

Node.js Requirements

Node.js Version: 14 or higher
Framework-Specific Dependencies: See node/README.md for detailed dependency information

Go Requirements

Go Version: See go/README.md for Go-specific requirements

📚 Documentation

Python Documentation

Python Amazon Scrapers — Complete guide for Python implementations

BeautifulSoup — Simple HTML parsing guide
Scrapy — Full-featured crawling framework guide
Playwright — Browser automation guide
Selenium — Selenium WebDriver guide

Node.js Documentation

Node.js Amazon Scrapers — Complete guide for Node.js implementations

Cheerio & Axios — Simple HTML parsing guide
Playwright — Browser automation guide
Puppeteer — Puppeteer browser automation guide

Go Documentation

Go Amazon Scrapers — Complete guide for Go implementations

HTTP Client — Go HTTP client guide

🔄 Why These Scrapers Exist

Generated automatically from real Amazon URLs using AI
Designed to survive common Amazon anti-bot defenses
Intended as reference-quality scrapers, not demos or proof of concepts
Available in multiple languages and frameworks to suit different use cases
Built with production-ready patterns and best practices
Integrated with ScrapeOps for reliable proxy rotation and anti-bot handling

How These Scrapers Were Generated

All scrapers in this repository were generated automatically using the ScrapeOps AI Scraper Builder:

The AI analyzed real Amazon pages
Identified stable data locations and structures
Mapped structured JSON schemas
Generated production-ready scraping code
Integrated proxy handling and retries automatically

This repo represents the output of the system, not hand-written scraping logic.

Get started: AI Scraper Builder

📊 Repository Structure

Amazon-Scrapers-master/
├── python/
│   ├── BeautifulSoup/          # Simple HTML parsing
│   ├── scrapy/                 # Full-featured crawling framework
│   ├── playwright/             # Browser automation (Python)
│   ├── selenium/               # Selenium WebDriver
│   └── README.md               # Python overview
├── node/
│   ├── cheerio & Axios/        # Simple HTML parsing
│   ├── playwright/             # Browser automation (Node.js)
│   ├── puppeteer/              # Puppeteer browser automation
│   └── README.md               # Node.js overview
├── go/
│   ├── http/                   # Go HTTP client
│   └── README.md               # Go overview
├── README.md                   # This file (main overview)
└── LICENSE                     # License information

🛡️ Anti-Bot Protection

All scrapers in this repository use ScrapeOps proxy service to automatically handle Amazon's anti-bot protection:

Automatic Proxy Rotation — IP rotation to avoid detection
Header Management — Browser fingerprinting and header rotation
Rate Limiting — Built-in rate limiting and request throttling
CAPTCHA Solving — Automatic CAPTCHA solving when needed
Request Timing Optimization — Intelligent request timing

No additional anti-bot configuration is required beyond setting your ScrapeOps API key.

📚 Additional Resources

Generate Your Own Scraper with AI

Use the ScrapeOps AI Scraper Builder to generate production-ready scrapers from example URLs.

Any ecommerce website
Multiple languages and frameworks
Proxy integration included
Free beta available

Get started: AI Scraper Builder

Browse Existing Scrapers

Check our Scraper Bank for pre-built scrapers across various websites and tech stacks.

Browse scrapers: Scraper Bank

ScrapeOps Resources

ScrapeOps Dashboard: Monitor Activity
ScrapeOps Documentation: View Docs
ScrapeOps Support: Contact Us

Legal Notice

These scrapers are provided for educational and research purposes.
You are responsible for ensuring your use complies with Amazon's terms of service and applicable laws in your jurisdiction.

License

See LICENSE file for license information.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
node		node
python		python
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Amazon Scrapers

🧭 Scraper Coverage

✅ Choose Your Language & Framework

Python Implementations

Node.js Implementations

📑 Table of Contents

📦 Available Implementations

Python Implementations

Node.js Implementations

Go Implementations

🚀 Quick Start

What You'll Need

Setup Steps

🛠️ Language & Framework Comparison

Python vs. Node.js

Framework Comparison

Lightweight HTML Parsing

Browser Automation

Large-Scale Crawling

📋 Common Use Cases

🔑 Get ScrapeOps API Key

Register for Free Account

Add API Key to Your Code

⚙️ Requirements

Python Requirements

Node.js Requirements

Go Requirements

📚 Documentation

Python Documentation

Node.js Documentation

Go Documentation

🔄 Why These Scrapers Exist

How These Scrapers Were Generated

📊 Repository Structure

🛡️ Anti-Bot Protection

📚 Additional Resources

Generate Your Own Scraper with AI

Browse Existing Scrapers

ScrapeOps Resources

Legal Notice

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages