Skip to content

nightmegaziifnb/oysho

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Oysho Scraper

Oysho Scraper collects structured product data from oysho.com across supported countries and languages. It’s built to turn messy catalog browsing into clean, export-ready product datasets for analytics, merchandising, and monitoring. If you need an Oysho product scraper that supports full-site runs or targeted URLs, this project is designed to scale without becoming fragile.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for oysho you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts product listings and detailed product-page information from Oysho’s online catalog. It solves the common problem of manually copying product data (or dealing with incomplete exports) by producing consistent JSON output with nested variants (colors, sizes, images). It’s ideal for developers, data teams, and ecommerce operators who need repeatable, auditable catalog data.

Flexible Crawl Modes

  • Scrape an entire country storefront starting from its homepage URL.
  • Scrape one or more category pages to focus on specific catalog sections.
  • Scrape individual product pages for maximum detail and accuracy.
  • Run multiple start URLs in a single job, including different regions.
  • Apply limits to control maximum products and category depth per run.

Features

Feature Description
Full-site scraping Crawls storefronts to discover categories and products at scale.
Category-first scraping Targets category pages to extract focused catalog segments quickly.
Product detail scraping Visits product pages to capture long descriptions, materials, care, and variant details.
Multi-URL input Scrapes multiple start URLs in one run, even across different regions.
Deduplication Returns unique products across overlapping categories and multiple start URLs.
Variant-aware output Preserves nested structures for colors, sizes, SKUs, and media assets.
Export-friendly summaries Provides flat summary fields (like colors, sizes, mainImage) alongside detailed nested JSON.
Resilient retry logic Designed to recover from temporary blocks and intermittent failures.

What Data This Scraper Extracts

Field Name Field Description
id Numeric product identifier from the catalog.
name Product name/title as displayed on the website.
description Short description snippet (when available).
longDescription Full product description from the product page.
reference Internal product reference code.
displayReference Customer-facing reference code format.
productType High-level product type (e.g., Clothing).
mainImage Primary image URL for the main/default variant.
colors Comma-separated summary of available color names.
sizes Comma-separated summary of available size labels.
price Current price (integer minor units, when provided by the site).
oldPrice Previous price if discounted, otherwise null.
keyword URL-friendly product keyword/slug.
category Category identifier or path segment for the product.
availabilityDate First available date/time if provided.
isBuyable Whether the product is purchasable.
onSpecial Whether the item is marked as on special.
website Storefront base URL used for the run.
categoryPage Category page URL associated with discovery.
productPage Canonical product page URL for the item.
mainColorid Color ID used as the primary/default selection.
colorsSizesImagesJSON Nested variant structure containing colors, size SKUs, dimensions, and media assets.
composition Materials composition as a structured list by part.
compositionDetail Detailed composition breakdown (parts, areas, components).
care Care instructions list (wash/iron/dry rules).
sustainability Sustainability flags and derived percentages (when present).
certifiedMaterials Certified materials block including certification references and percentages.
traceability Traceability data structure (when present).
additionalInfo Any extra product info text provided by the site.

Example Output

{
  "id": 183390929,
  "name": "Comfortlux overlay tank top",
  "description": "",
  "longDescription": "Comfortlux tank top with bra overlay with removable lightly padded cups. Breathable, quick-drying, high-strength fabric. Crossed strap detail at the back.",
  "reference": "30045904-V2025",
  "displayReference": "0045/904",
  "productType": "Clothing",
  "mainImage": "https://static.oysho.net/assets/public/d76a/d5bb/b2a44336bc5a/f4b5f541a69c/30045904791-a1/30045904791-a1.jpg?ts=1738250071894",
  "colors": "Russet Mocha, Dark Brown",
  "sizes": "XS, S, M, L, XL",
  "price": 2999,
  "oldPrice": null,
  "keyword": "comfortlux-overlay-tank-top",
  "category": "womens-sports-t-shirts-n4764",
  "availabilityDate": "2025-01-30 14:58:16.0",
  "isBuyable": true,
  "onSpecial": false,
  "website": "https://www.oysho.com/gb/",
  "categoryPage": "https://www.oysho.com/gb/womens-sports-t-shirts-n4764",
  "productPage": "https://www.oysho.com/gb/comfortlux-overlay-tank-top-l30045904?pelement=183390929",
  "mainColorid": "791",
  "colorsSizesImagesJSON": [
    {
      "id": "791",
      "name": "RUSSET MOCHA",
      "productPageSelectedColor": "https://www.oysho.com/gb/comfortlux-overlay-tank-top-l30045904?pelement=183390929&colorId=791",
      "xmedia": [
        "https://static.oysho.net/assets/public/d76a/d5bb/b2a44336bc5a/f4b5f541a69c/30045904791-a1/30045904791-a1.jpg?ts=1738250071894"
      ],
      "sizes": [
        {
          "sku": 174731466,
          "name": "XS",
          "partnumber": "3004590479101-V2025",
          "isBuyable": true,
          "price": "2999",
          "oldPrice": null,
          "skuDimensions": [
            { "dimensionId": "127", "value": 41.7, "dimensionName": "FRONT LENGTH" }
          ]
        }
      ]
    }
  ]
}

Directory Structure Tree

Oysho/
├── src/
│   ├── main.py
│   ├── cli.py
│   ├── runner/
│   │   ├── __init__.py
│   │   ├── job.py
│   │   ├── retry.py
│   │   └── limits.py
│   ├── crawler/
│   │   ├── __init__.py
│   │   ├── browser.py
│   │   ├── routes.py
│   │   └── session.py
│   ├── extractors/
│   │   ├── __init__.py
│   │   ├── discover.py
│   │   ├── category.py
│   │   ├── product.py
│   │   └── normalize.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── product.py
│   │   └── schema.py
│   ├── outputs/
│   │   ├── __init__.py
│   │   ├── json_writer.py
│   │   ├── csv_flatten.py
│   │   └── validators.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── url.py
│   │   ├── dedupe.py
│   │   ├── logging.py
│   │   └── time.py
│   └── config/
│       ├── settings.example.json
│       └── user_agents.txt
├── data/
│   ├── inputs.sample.json
│   └── sample_output.json
├── tests/
│   ├── test_dedupe.py
│   ├── test_normalize.py
│   └── test_extractors.py
├── .gitignore
├── LICENSE
├── requirements.txt
└── README.md

Use Cases

  • Ecommerce analysts use it to track price and availability changes, so they can spot promotions early and forecast demand.
  • Merchandising teams use it to extract product attributes and variants, so they can compare assortments across regions.
  • Data engineers use it to build product catalogs for BI pipelines, so they can standardize reporting across categories.
  • Competitor monitoring teams use it to collect structured product feeds, so they can benchmark materials, sizing, and pricing trends.
  • Marketplace operators use it to populate listings with images and variant data, so they can reduce manual entry and errors.

FAQs

How do I choose what to scrape (full site vs category vs product URLs)? Use storefront URLs when you want broad coverage, category URLs when you want a targeted segment, and product URLs when you only need specific items. You can also provide multiple URLs in a single run to mix and match strategies.

Why do I sometimes get fewer results than my configured maximum? Some catalog pages may include placeholder or incomplete items that are filtered out. Also, the website can show separate color tiles for the same product, while the scraper returns a single bundled product containing multiple colors—so “5 tiles” on the page might become “1 product” in the output.

What’s the difference between colors, sizes, and colorsSizesImagesJSON? colors and sizes are flat summaries for quick exports (CSV/Sheets-friendly). colorsSizesImagesJSON contains the full nested variant structure with per-color media, per-size SKUs, and size dimensions.

What should I do if requests get blocked or I see access errors? Temporary blocks can happen. The most effective fix is to rerun the job with retries enabled and reduce concurrency. If you’re running at high volume, prefer residential IP rotation and keep request rates steady rather than bursty.


Performance Benchmarks and Results

Primary Metric: ~1,000 products scraped in ~5 minutes when running category-first discovery with product detail extraction enabled.

Reliability Metric: 92–97% successful completion rate across large runs when using retry + backoff, with most failures tied to temporary access blocks.

Efficiency Metric: Average throughput of 3–5 product detail pages per second on typical configurations, with output streamed to JSON to avoid memory spikes.

Quality Metric: 95%+ completeness for core commercial fields (name, price, images, variants), with occasional gaps on products that load incomplete or placeholder data.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors