Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Times of India news scraper #261

Merged
merged 2 commits into from
Sep 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Web-Scraping/Times_of_india/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Scraping Times of India

Scraping times of india top headlines in four domains : Flash news, News in Bulletin, Entertainment, Latest news.
using REquests and Beautiful Soup Modules.

Link for Website - "http://timesofindia.indiatimes.com/"

![output](TOI.png)
Binary file added Web-Scraping/Times_of_india/TOI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions Web-Scraping/Times_of_india/Times_of_india.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import requests
import datetime
from bs4 import BeautifulSoup

url = "http://timesofindia.indiatimes.com/"

# Use requests library to get html from TOI's page
response = requests.get(url)
# Make the html soup object
soup = BeautifulSoup(response.content, 'html.parser')

print("\t!!!** The Times of India **!!!")
today = datetime.date.today()
print(today.strftime('\tThe date %d, %b %Y'))

# scrping times of India in four domains:
print("\n\t\t**** Flash news ****")
for div in soup.findAll('div', attrs={'id':'featuredstory'}):
for a in div.findAll('a'):
print(a.text)

print("\n\t\t**** News in Bulletin ****")
for div in soup.findAll('div', attrs={'class':'top-story'}):
for a in div.findAll('li'):
print (a.text)


print("\n\t\t**** Entertainment ****\t")
for div in soup.findAll('div', attrs={'class':'entrmnt-wdgt-outer'}):
for a in div.findAll('li'):
print(a.text)


print("\n\t\t**** Latest News ****\t\n")
for div in soup.findAll('div', attrs={'id':'lateststories'}):
for a in div.findAll('li'):
print(a.text)