Oysho Scraper

Oysho Scraper collects structured product data from oysho.com across supported countries and languages. It’s built to turn messy catalog browsing into clean, export-ready product datasets for analytics, merchandising, and monitoring. If you need an Oysho product scraper that supports full-site runs or targeted URLs, this project is designed to scale without becoming fragile.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for oysho you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts product listings and detailed product-page information from Oysho’s online catalog. It solves the common problem of manually copying product data (or dealing with incomplete exports) by producing consistent JSON output with nested variants (colors, sizes, images). It’s ideal for developers, data teams, and ecommerce operators who need repeatable, auditable catalog data.

Flexible Crawl Modes

Scrape an entire country storefront starting from its homepage URL.
Scrape one or more category pages to focus on specific catalog sections.
Scrape individual product pages for maximum detail and accuracy.
Run multiple start URLs in a single job, including different regions.
Apply limits to control maximum products and category depth per run.

Features

Feature	Description
Full-site scraping	Crawls storefronts to discover categories and products at scale.
Category-first scraping	Targets category pages to extract focused catalog segments quickly.
Product detail scraping	Visits product pages to capture long descriptions, materials, care, and variant details.
Multi-URL input	Scrapes multiple start URLs in one run, even across different regions.
Deduplication	Returns unique products across overlapping categories and multiple start URLs.
Variant-aware output	Preserves nested structures for colors, sizes, SKUs, and media assets.
Export-friendly summaries	Provides flat summary fields (like `colors`, `sizes`, `mainImage`) alongside detailed nested JSON.
Resilient retry logic	Designed to recover from temporary blocks and intermittent failures.

What Data This Scraper Extracts

Field Name	Field Description
id	Numeric product identifier from the catalog.
name	Product name/title as displayed on the website.
description	Short description snippet (when available).
longDescription	Full product description from the product page.
reference	Internal product reference code.
displayReference	Customer-facing reference code format.
productType	High-level product type (e.g., Clothing).
mainImage	Primary image URL for the main/default variant.
colors	Comma-separated summary of available color names.
sizes	Comma-separated summary of available size labels.
price	Current price (integer minor units, when provided by the site).
oldPrice	Previous price if discounted, otherwise null.
keyword	URL-friendly product keyword/slug.
category	Category identifier or path segment for the product.
availabilityDate	First available date/time if provided.
isBuyable	Whether the product is purchasable.
onSpecial	Whether the item is marked as on special.
website	Storefront base URL used for the run.
categoryPage	Category page URL associated with discovery.
productPage	Canonical product page URL for the item.
mainColorid	Color ID used as the primary/default selection.
colorsSizesImagesJSON	Nested variant structure containing colors, size SKUs, dimensions, and media assets.
composition	Materials composition as a structured list by part.
compositionDetail	Detailed composition breakdown (parts, areas, components).
care	Care instructions list (wash/iron/dry rules).
sustainability	Sustainability flags and derived percentages (when present).
certifiedMaterials	Certified materials block including certification references and percentages.
traceability	Traceability data structure (when present).
additionalInfo	Any extra product info text provided by the site.

Example Output

{
  "id": 183390929,
  "name": "Comfortlux overlay tank top",
  "description": "",
  "longDescription": "Comfortlux tank top with bra overlay with removable lightly padded cups. Breathable, quick-drying, high-strength fabric. Crossed strap detail at the back.",
  "reference": "30045904-V2025",
  "displayReference": "0045/904",
  "productType": "Clothing",
  "mainImage": "https://static.oysho.net/assets/public/d76a/d5bb/b2a44336bc5a/f4b5f541a69c/30045904791-a1/30045904791-a1.jpg?ts=1738250071894",
  "colors": "Russet Mocha, Dark Brown",
  "sizes": "XS, S, M, L, XL",
  "price": 2999,
  "oldPrice": null,
  "keyword": "comfortlux-overlay-tank-top",
  "category": "womens-sports-t-shirts-n4764",
  "availabilityDate": "2025-01-30 14:58:16.0",
  "isBuyable": true,
  "onSpecial": false,
  "website": "https://www.oysho.com/gb/",
  "categoryPage": "https://www.oysho.com/gb/womens-sports-t-shirts-n4764",
  "productPage": "https://www.oysho.com/gb/comfortlux-overlay-tank-top-l30045904?pelement=183390929",
  "mainColorid": "791",
  "colorsSizesImagesJSON": [
    {
      "id": "791",
      "name": "RUSSET MOCHA",
      "productPageSelectedColor": "https://www.oysho.com/gb/comfortlux-overlay-tank-top-l30045904?pelement=183390929&colorId=791",
      "xmedia": [
        "https://static.oysho.net/assets/public/d76a/d5bb/b2a44336bc5a/f4b5f541a69c/30045904791-a1/30045904791-a1.jpg?ts=1738250071894"
      ],
      "sizes": [
        {
          "sku": 174731466,
          "name": "XS",
          "partnumber": "3004590479101-V2025",
          "isBuyable": true,
          "price": "2999",
          "oldPrice": null,
          "skuDimensions": [
            { "dimensionId": "127", "value": 41.7, "dimensionName": "FRONT LENGTH" }
          ]
        }
      ]
    }
  ]
}

Directory Structure Tree

Oysho/
├── src/
│   ├── main.py
│   ├── cli.py
│   ├── runner/
│   │   ├── __init__.py
│   │   ├── job.py
│   │   ├── retry.py
│   │   └── limits.py
│   ├── crawler/
│   │   ├── __init__.py
│   │   ├── browser.py
│   │   ├── routes.py
│   │   └── session.py
│   ├── extractors/
│   │   ├── __init__.py
│   │   ├── discover.py
│   │   ├── category.py
│   │   ├── product.py
│   │   └── normalize.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── product.py
│   │   └── schema.py
│   ├── outputs/
│   │   ├── __init__.py
│   │   ├── json_writer.py
│   │   ├── csv_flatten.py
│   │   └── validators.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── url.py
│   │   ├── dedupe.py
│   │   ├── logging.py
│   │   └── time.py
│   └── config/
│       ├── settings.example.json
│       └── user_agents.txt
├── data/
│   ├── inputs.sample.json
│   └── sample_output.json
├── tests/
│   ├── test_dedupe.py
│   ├── test_normalize.py
│   └── test_extractors.py
├── .gitignore
├── LICENSE
├── requirements.txt
└── README.md

Use Cases

Ecommerce analysts use it to track price and availability changes, so they can spot promotions early and forecast demand.
Merchandising teams use it to extract product attributes and variants, so they can compare assortments across regions.
Data engineers use it to build product catalogs for BI pipelines, so they can standardize reporting across categories.
Competitor monitoring teams use it to collect structured product feeds, so they can benchmark materials, sizing, and pricing trends.
Marketplace operators use it to populate listings with images and variant data, so they can reduce manual entry and errors.

FAQs

How do I choose what to scrape (full site vs category vs product URLs)? Use storefront URLs when you want broad coverage, category URLs when you want a targeted segment, and product URLs when you only need specific items. You can also provide multiple URLs in a single run to mix and match strategies.

Why do I sometimes get fewer results than my configured maximum? Some catalog pages may include placeholder or incomplete items that are filtered out. Also, the website can show separate color tiles for the same product, while the scraper returns a single bundled product containing multiple colors—so “5 tiles” on the page might become “1 product” in the output.

What’s the difference between colors, sizes, and colorsSizesImagesJSON? colors and sizes are flat summaries for quick exports (CSV/Sheets-friendly). colorsSizesImagesJSON contains the full nested variant structure with per-color media, per-size SKUs, and size dimensions.

What should I do if requests get blocked or I see access errors? Temporary blocks can happen. The most effective fix is to rerun the job with retries enabled and reduce concurrency. If you’re running at high volume, prefer residential IP rotation and keep request rates steady rather than bursty.

Performance Benchmarks and Results

Primary Metric: ~1,000 products scraped in ~5 minutes when running category-first discovery with product detail extraction enabled.

Reliability Metric: 92–97% successful completion rate across large runs when using retry + backoff, with most failures tied to temporary access blocks.

Efficiency Metric: Average throughput of 3–5 product detail pages per second on typical configurations, with output streamed to JSON to avoid memory spikes.

Quality Metric: 95%+ completeness for core commercial fields (name, price, images, variants), with occasional gaps on products that load incomplete or placeholder data.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oysho Scraper

Introduction

Flexible Crawl Modes

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Oysho Scraper

Introduction

Flexible Crawl Modes

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages