Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Commit fb1ba89

Browse files
authored
Merge pull request #158 from Namyalg/Questions-from-Project-Euler
Questions from project euler
2 parents c9fdb88 + 3238396 commit fb1ba89

File tree

11 files changed

+179
-1
lines changed

11 files changed

+179
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Project Euler #
2+
3+
![Image](./images/euler_home.PNG)
4+
5+
Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve
6+
7+
This script written in Python, gets all the 700+ questions across 15 pages which is written into a CSV file named Project_Euler.csv
8+
9+
Beautiful Soup is used for scraping the URL : https://projecteuler.net/archives
10+
11+
Regular expressions have also been used in order to obtain the description of the questions
12+
13+
## Implementation ##
14+
15+
Using **inspect element**, the contents of the page can be understood
16+
17+
The structure of each page is as shown
18+
19+
![Image](./images/euler_questions.PNG)
20+
21+
The <tr> element consists of the description of the question
22+
23+
Each question has the following components
24+
25+
![Image](./images/question1.PNG)
26+
27+
The contents are parsed and stored using Beautiful Soup, a library built for web scraping
28+
29+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env python3
2+
3+
#Imports and dependencies
4+
5+
import requests
6+
from bs4 import BeautifulSoup
7+
import re
8+
import csv
9+
10+
def Euler():
11+
12+
#The contents are written into a CSV file
13+
#Each question has a serial number, name of the problem and description of the problem
14+
15+
with open('Project_Euler.csv', 'w', newline='') as file:
16+
writer = csv.writer(file)
17+
writer.writerow(["Problem Number", "Name" , "Description"])
18+
19+
#There are 15 pages in all, the page number is appended to the URL
20+
start = 1
21+
pages = 15
22+
23+
for page in range(start , pages + start):
24+
25+
#Response is got from each page, the questions are then searched for
26+
page_url = "https://projecteuler.net/archives;page="+ str(page)
27+
response = requests.get(page_url)
28+
soup = BeautifulSoup(response.text,"html.parser")
29+
30+
#All the questions are located within the <table> tag
31+
#This information can be found out by using inspect element, Ctrl+Shift+I
32+
33+
for link in soup.find('table' , attrs={"id" : "problems_table"}).find_all('a'):
34+
35+
#The link to the question is located in a <a> tag
36+
question_url = "https://projecteuler.net/" + link['href']
37+
38+
#The name and question number are obtained
39+
question_number = link['href'].split('=')[-1]
40+
question_name = link.string
41+
42+
ques_response = requests.get(question_url)
43+
ques_contents = BeautifulSoup(ques_response.text, "html.parser")
44+
description = ''
45+
46+
#In each question element, the description is mentioned in the <div> tag
47+
48+
for content in ques_contents.find("div" , attrs={"class":"problem_content"}).children:
49+
50+
#The content between the tags are obtained getting rid of the tag elements
51+
52+
content = re.sub(r'\<.*?>', r' ', str(content))
53+
description += content
54+
55+
#Each entry is written into the file
56+
57+
writer.writerow([question_number, question_name , description])
58+
59+
if __name__ == "__main__":
60+
Euler()
Loading
Loading
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Project Euler #
2+
3+
![Image](./images/euler_home.PNG)
4+
5+
Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve
6+
7+
This script written in Python, gets all the 700+ questions across 15 pages which is written into a CSV file named Project_Euler.csv
8+
9+
Beautiful Soup is used for scraping the URL : https://projecteuler.net/archives
10+
11+
Regular expressions have also been used in order to obtain the description of the questions
12+
13+
## Implementation ##
14+
15+
Using **inspect element**, the contents of the page can be understood
16+
17+
The structure of each page is as shown
18+
19+
![Image](./images/euler_questions.PNG)
20+
21+
The <tr> element consists of the description of the question
22+
23+
Each question has the following components
24+
25+
![Image](./images/question1.PNG)
26+
27+
The contents are parsed and stored using Beautiful Soup, a library built for web scraping
28+
29+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/usr/bin/env python3
2+
3+
#Imports and dependencies
4+
5+
import requests
6+
from bs4 import BeautifulSoup
7+
import re
8+
import csv
9+
10+
def Euler():
11+
12+
#The contents are written into a CSV file
13+
#Each question has a serial number, name of the problem and description of the problem
14+
15+
with open('Project_Euler.csv', 'w', newline='') as file:
16+
writer = csv.writer(file)
17+
writer.writerow(["Problem Number", "Name" , "Description"])
18+
19+
#There are 15 pages in all, the page number is appended to the URL
20+
start = 1
21+
pages = 15
22+
23+
for page in range(start , pages + start):
24+
25+
#Response is got from each page, the questions are then searched for
26+
page_url = "https://projecteuler.net/archives;page="+ str(page)
27+
response = requests.get(page_url)
28+
soup = BeautifulSoup(response.text,"html.parser")
29+
30+
#All the questions are located within the <table> tag
31+
#This information can be found out by using inspect element, Ctrl+Shift+I
32+
33+
for link in soup.find('table' , attrs={"id" : "problems_table"}).find_all('a'):
34+
35+
#The link to the question is located in a <a> tag
36+
question_url = "https://projecteuler.net/" + link['href']
37+
38+
#The name and question number are obtained
39+
question_number = link['href'].split('=')[-1]
40+
question_name = link.string
41+
42+
ques_response = requests.get(question_url)
43+
ques_contents = BeautifulSoup(ques_response.text, "html.parser")
44+
description = ''
45+
46+
#In each question element, the description is mentioned in the <div> tag
47+
48+
for content in ques_contents.find("div" , attrs={"class":"problem_content"}).children:
49+
50+
#The content between the tags are obtained getting rid of the tag elements
51+
52+
content = re.sub(r'\<.*?>', r' ', str(content))
53+
description += content
54+
55+
#Each entry is written into the file
56+
57+
writer.writerow([question_number, question_name , description])
58+
59+
if __name__ == "__main__":
60+
Euler()
Loading
Loading
Loading

Web-Scraping/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# Web-Scraping
2-
Web scraping is about downloading structured data from the web, selecting some of that data, and passing along what you selected to another process. This folder contains scripts related to web scraping with the help BeautifulSoup , Scrapy , Requests library.
2+
Web scraping is about downloading structured data from the web, selecting some of that data, and passing along what you selected to another process. This folder contains scripts related to web scraping with the help BeautifulSoup , Scrapy , Requests library.

0 commit comments

Comments
 (0)