Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Commit 89d9056

Browse files
authored
Merge pull request #277 from anut123/master
E-Commerce Website
2 parents 8fb8684 + 4bcbaf6 commit 89d9056

File tree

3 files changed

+64
-0
lines changed

3 files changed

+64
-0
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# WebScrap
2+
3+
Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser.
4+
5+
In this repo Web Scraping is done on an E-Commerce Website using BeautifulSoup in [Python](https://www.python.org/)
6+
7+
[Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
8+
* Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
9+
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import bs4
2+
from urllib.request import urlopen
3+
from bs4 import BeautifulSoup as soup
4+
5+
my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
6+
7+
# opening url and grabbing the web page
8+
uClient = urlopen(my_url)
9+
page_html = uClient.read()
10+
uClient.close()
11+
12+
# html parsing
13+
page_soup = soup(page_html, 'html.parser')
14+
15+
# grabbing all containers with class name = item-container
16+
containers = page_soup.findAll('div', {'class':'item-container'})
17+
18+
filename = "products.csv"
19+
f = open(filename, 'w')
20+
21+
headers = "brands, product_name, shipping\n"
22+
23+
f.write(headers)
24+
25+
container = containers[1]
26+
27+
for container in containers:
28+
brand = container.div.div.a.img['title']
29+
title_container = container.findAll('a', {'class':'item-title'})
30+
product_name = title_container[0].text
31+
ship_container = container.findAll('li', {'class':'price-ship'})
32+
# use strip() to remove blank spaces before and after text
33+
shipping = ship_container[0].text.strip()
34+
35+
print("brand:" + brand)
36+
print("product_name:" + product_name)
37+
print("shipping:" + shipping)
38+
39+
f.write(brand + ',' + product_name.replace(',' , '|') + ',' + shipping + '\n')
40+
41+
f.close()
42+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
brands, product_name, shipping
2+
GIGABYTE,GIGABYTE GeForce GTX 1070 DirectX 12 GV-N1070WF2OC-8GD Video Cards,Free Shipping
3+
EVGA,EVGA GeForce GTX 1080 SC GAMING ACX 3.0| 08G-P4-6183-KR| 8GB GDDR5X| LED| DX12 OSD Support (PXOC),Free Shipping
4+
XFX,XFX Radeon RX 470 RS Triple X DirectX 12 RX-470P436BM 4GB 256-Bit GDDR5 PCI Express 3.0 CrossFireX Support Video Card,$3.99 Shipping
5+
ASUS,ASUS Radeon RX 480 DirectX 12 DUAL-RX480-O8G Video Card,$4.99 Shipping
6+
ZOTAC,ZOTAC GeForce GTX 1080 Ti AMP Edition 11GB GDDR5X 352-bit Gaming Graphics Card VR Ready 16+2 Power Phase Freeze Fan Stop IceStorm Cooling Spectra Lighting ZT-P10810D-10P,$6.99 Shipping
7+
ASUS,ASUS ROG GeForce GTX 1080 STRIX-GTX1080-A8G-GAMING Video Card,Free Shipping
8+
EVGA,EVGA GeForce GTX 1070 SC GAMING ACX 3.0 Black Edition| 08G-P4-5173-KR| 8GB GDDR5| LED| DX12 OSD Support (PXOC),Free Shipping
9+
GIGABYTE,GIGABYTE Radeon RX 480 G1 Gaming 4GB GV-RX480G1GAMING-4GD Video Card,$4.99 Shipping
10+
XFX,XFX Radeon RS RX 480 DirectX 12 RX-480P836BM 8GB 256-Bit GDDR5 PCI Express 3.0 CrossFireX Support Video Card,$4.99 Shipping
11+
ZOTAC,ZOTAC GeForce GTX 1070 Mini| ZT-P10700G-10M| 8GB GDDR5,Free Shipping
12+
MSI,MSI Radeon RX 480 DirectX 12 Radeon RX 480 4G Video Card,$4.99 Shipping
13+
XFX,XFX Radeon GTR RX 480 DirectX 12 RX-480P8DBA6 Black Edition Video Card,$4.99 Shipping

0 commit comments

Comments
 (0)