Production-ready Amazon web scrapers for extracting structured ecommerce data from Amazon.
Includes scrapers for ASIN product pages, search results (SERP), and category pages, available in Python and Node.js with multiple framework implementations.
All scrapers are generated using the ScrapeOps AI Scraper Builder and designed for reliability at scale with built-in proxy rotation, retries, and anti-bot handling via ScrapeOps.
| Page Type / Data | Python | Node.js |
|---|---|---|
Product Page (ASIN /dp/) |
✅ Available | ✅ Available |
| Search Results (SERP) | ✅ Available | ✅ Available |
| Category Page | ✅ Available | ✅ Available |
| Reviews | ⏳ Coming soon | ⏳ Coming soon |
| Sellers | ⏳ Coming soon | ⏳ Coming soon |
| Framework | Best For | Complexity | Start Here |
|---|---|---|---|
| BeautifulSoup | Simple HTML parsing, quick scripts, lightweight scraping | ⭐ Low | python/BeautifulSoup/README.md |
| Scrapy | Large-scale crawling, pipelines, production workflows | ⭐⭐⭐ Medium | python/scrapy/README.md |
| Playwright | JavaScript-heavy sites, dynamic content, browser automation | ⭐⭐⭐ High | python/playwright/README.md |
| Selenium | Browser automation, legacy support, wide browser compatibility | ⭐⭐⭐ High | python/selenium/README.md |
| Framework | Best For | Complexity | Start Here |
|---|---|---|---|
| Cheerio & Axios | Simple HTML parsing, quick scripts, lightweight scraping | ⭐ Low | node/cheerio & Axios/README.md |
| Playwright | JavaScript-heavy sites, dynamic content, browser automation | ⭐⭐⭐ High | node/playwright/README.md |
| Puppeteer | Browser automation, headless Chrome, anti-bot evasion | ⭐⭐⭐ High | node/puppeteer/README.md |
- 📦 Available Implementations
- 🚀 Quick Start
- 🛠️ Language & Framework Comparison
- 📋 Common Use Cases
- 🔑 Get ScrapeOps API Key
- ⚙️ Requirements
- 📚 Documentation
- 🔄 Why These Scrapers Exist
Python Amazon Scrapers — Production-ready Amazon scrapers in Python
-
BeautifulSoup — Simple HTML parsing with BeautifulSoup
- Product pages, search results (SERP), and category pages
- Fast and lightweight, ideal for static HTML content
python/BeautifulSoup/README.md
-
Scrapy — Full-featured crawling framework
- Product page extraction
- Large-scale crawling and production pipelines
python/scrapy/README.md
-
Playwright — Browser automation framework
- Product pages, search results (SERP), and category pages
- JavaScript-heavy sites and dynamic content
python/playwright/README.md
-
Selenium — Browser automation with Selenium WebDriver
- Product pages, search results (SERP), and category pages
- Wide browser support and legacy system compatibility
python/selenium/README.md
Node.js Amazon Scrapers — Production-ready Amazon scrapers in Node.js
-
Cheerio & Axios — Simple HTML parsing with Cheerio & Axios
- Product pages, search results (SERP), and category pages
- Fast and lightweight, ideal for static HTML content
node/cheerio & Axios/README.md
-
Playwright — Browser automation framework
- Product pages, search results (SERP), and category pages
- JavaScript-heavy sites and dynamic content
node/playwright/README.md
-
Puppeteer — Browser automation with Puppeteer
- Product pages, search results (SERP), and category pages
- Headless Chrome and anti-bot evasion
node/puppeteer/README.md
Go Amazon Scrapers — Production-ready Amazon scrapers in Go
- HTTP Client — Go-based HTTP client scrapers
- Product page extraction
go/http/README.md
- ✅ Python 3.7+ (for Python implementations) or Node.js 14+ (for Node.js implementations)
- ✅ ScrapeOps API key (Get one free)
- ✅ Framework-specific dependencies (see language-specific documentation)
-
Choose your language and framework:
- Python →
python/README.md— Python implementations overview - Node.js →
node/README.md— Node.js implementations overview - Go →
go/README.md— Go implementations overview
- Python →
-
Install dependencies:
Python (BeautifulSoup):
pip install requests beautifulsoup4 lxml
Python (Playwright):
pip install playwright playwright-stealth playwright install chromium
Python (Selenium):
pip install undetected-chromedriver seleniumwire selenium
Node.js (Cheerio & Axios):
npm install axios cheerio he
Node.js (Playwright):
npm install playwright-extra puppeteer-extra-plugin-stealth cheerio npx playwright install chromium
Node.js (Puppeteer):
npm install puppeteer-extra puppeteer-extra-plugin-stealth cheerio
-
Get your ScrapeOps API key:
- Sign up at ScrapeOps (free account)
- Copy your API key from the dashboard
- Add it to your scraper (see framework-specific documentation)
-
Run a scraper:
- Navigate to the framework directory (e.g.,
python/BeautifulSoup/product/product_data/ornode/cheerio & Axios/product/product_data/) - Follow the framework-specific README for detailed instructions
- Navigate to the framework directory (e.g.,
👉 Start with python/BeautifulSoup/README.md or node/cheerio & Axios/README.md for the most complete documentation and ready-to-use scrapers.
| Language | Best For | Frameworks Available |
|---|---|---|
| Python | Data science, ML pipelines, large-scale crawling | BeautifulSoup, Scrapy, Playwright, Selenium |
| Node.js | Web development, async operations, JavaScript ecosystem | Cheerio & Axios, Playwright, Puppeteer |
| Framework | Language | Complexity | Best For |
|---|---|---|---|
| BeautifulSoup | Python | ⭐ Low | Simple HTML parsing, quick scripts |
| Cheerio & Axios | Node.js | ⭐ Low | Simple HTML parsing, quick scripts |
| Framework | Language | Complexity | Best For |
|---|---|---|---|
| Playwright | Python/Node.js | ⭐⭐⭐ High | JavaScript-heavy sites, modern async support |
| Puppeteer | Node.js | ⭐⭐⭐ High | Headless Chrome, anti-bot evasion |
| Selenium | Python | ⭐⭐⭐ High | Wide browser support, legacy systems |
| Framework | Language | Complexity | Best For |
|---|---|---|---|
| Scrapy | Python | ⭐⭐⭐ Medium | Production pipelines, large-scale crawling |
- Amazon price monitoring — Track product prices over time
- Product catalog ingestion — Build comprehensive product databases
- Competitive pricing analysis — Compare prices across products
- Review and rating aggregation — Collect customer feedback data
- Search results analysis (SERP) — Analyze search result rankings
- Category hierarchy mapping — Map Amazon's category structure
- Market research — Gather product and pricing intelligence
- Ecommerce data pipelines — Build automated data collection systems
All Amazon scrapers require a ScrapeOps API key to access the proxy service that handles Amazon's anti-bot protection.
- Visit the ScrapeOps registration page
- Sign up for a free account
- Navigate to your dashboard to retrieve your API key
Method 1: Direct Assignment (Quick Start)
- Open the scraper file you want to use
- Locate the
API_KEYvariable near the top of the file - Replace the placeholder with your actual ScrapeOps API key:
# Python API_KEY = "your-actual-api-key-here"
// Node.js const API_KEY = "your-actual-api-key-here";
Method 2: Environment Variable (Recommended for Production)
-
Set the environment variable:
# macOS/Linux export SCRAPEOPS_API_KEY="your-actual-api-key-here" # Windows PowerShell $env:SCRAPEOPS_API_KEY="your-actual-api-key-here"
-
Modify the code to read from environment:
# Python import os API_KEY = os.getenv("SCRAPEOPS_API_KEY", "your-default-key")
// Node.js const API_KEY = process.env.SCRAPEOPS_API_KEY || "your-default-key";
Note: Some v1 scripts read from a hardcoded
API_KEYconstant. If so, either editAPI_KEYdirectly or update the script to use environment variables.
- Python Version: 3.7 or higher
- Framework-Specific Dependencies: See
python/README.mdfor detailed dependency information
- Node.js Version: 14 or higher
- Framework-Specific Dependencies: See
node/README.mdfor detailed dependency information
- Go Version: See
go/README.mdfor Go-specific requirements
Python Amazon Scrapers — Complete guide for Python implementations
- BeautifulSoup — Simple HTML parsing guide
- Scrapy — Full-featured crawling framework guide
- Playwright — Browser automation guide
- Selenium — Selenium WebDriver guide
Node.js Amazon Scrapers — Complete guide for Node.js implementations
- Cheerio & Axios — Simple HTML parsing guide
- Playwright — Browser automation guide
- Puppeteer — Puppeteer browser automation guide
Go Amazon Scrapers — Complete guide for Go implementations
- HTTP Client — Go HTTP client guide
- Generated automatically from real Amazon URLs using AI
- Designed to survive common Amazon anti-bot defenses
- Intended as reference-quality scrapers, not demos or proof of concepts
- Available in multiple languages and frameworks to suit different use cases
- Built with production-ready patterns and best practices
- Integrated with ScrapeOps for reliable proxy rotation and anti-bot handling
All scrapers in this repository were generated automatically using the ScrapeOps AI Scraper Builder:
- The AI analyzed real Amazon pages
- Identified stable data locations and structures
- Mapped structured JSON schemas
- Generated production-ready scraping code
- Integrated proxy handling and retries automatically
This repo represents the output of the system, not hand-written scraping logic.
Get started: AI Scraper Builder
Amazon-Scrapers-master/
├── python/
│ ├── BeautifulSoup/ # Simple HTML parsing
│ ├── scrapy/ # Full-featured crawling framework
│ ├── playwright/ # Browser automation (Python)
│ ├── selenium/ # Selenium WebDriver
│ └── README.md # Python overview
├── node/
│ ├── cheerio & Axios/ # Simple HTML parsing
│ ├── playwright/ # Browser automation (Node.js)
│ ├── puppeteer/ # Puppeteer browser automation
│ └── README.md # Node.js overview
├── go/
│ ├── http/ # Go HTTP client
│ └── README.md # Go overview
├── README.md # This file (main overview)
└── LICENSE # License information
All scrapers in this repository use ScrapeOps proxy service to automatically handle Amazon's anti-bot protection:
- Automatic Proxy Rotation — IP rotation to avoid detection
- Header Management — Browser fingerprinting and header rotation
- Rate Limiting — Built-in rate limiting and request throttling
- CAPTCHA Solving — Automatic CAPTCHA solving when needed
- Request Timing Optimization — Intelligent request timing
No additional anti-bot configuration is required beyond setting your ScrapeOps API key.
Use the ScrapeOps AI Scraper Builder to generate production-ready scrapers from example URLs.
- Any ecommerce website
- Multiple languages and frameworks
- Proxy integration included
- Free beta available
Get started: AI Scraper Builder
Check our Scraper Bank for pre-built scrapers across various websites and tech stacks.
Browse scrapers: Scraper Bank
- ScrapeOps Dashboard: Monitor Activity
- ScrapeOps Documentation: View Docs
- ScrapeOps Support: Contact Us
These scrapers are provided for educational and research purposes.
You are responsible for ensuring your use complies with Amazon's terms of service and applicable laws in your jurisdiction.
See LICENSE file for license information.