Skip to content
This repository was archived by the owner on Nov 30, 2022. It is now read-only.

Times of India news scraper #261

Merged
merged 2 commits into from
Sep 18, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Web-Scraping/Times_of_india/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Scraping Times of India

Scraping times of india top headlines in four domains : Flash news, News in Bulletin, Entertainment, Latest news.
using REquests and Beautiful Soup Modules.

Link for Website - "http://timesofindia.indiatimes.com/"

![output](TOI.png)
Binary file added Web-Scraping/Times_of_india/TOI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions Web-Scraping/Times_of_india/Times_of_india.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import requests
import datetime
from bs4 import BeautifulSoup

url = "http://timesofindia.indiatimes.com/"

# Use requests library to get html from artist's page
response = requests.get(url)
# Make the html soup object
soup = BeautifulSoup(response.content, 'html.parser')

print("\t!!!**The Times of India**!!!")
today = datetime.date.today()
print(today.strftime('\tThe date %d, %b %Y'))

# scrping times of India in four domains:
print("\n\t\t****Flash news****")
for div in soup.findAll('div', attrs={'id':'featuredstory'}):
for a in div.findAll('a'):
print(a.text)

print("\n\t\t**** News in Bulletin ****")
for div in soup.findAll('div', attrs={'class':'top-story'}):
for a in div.findAll('li'):
print (a.text)


print("\n\t\t**** Entertainment ****\t")
for div in soup.findAll('div', attrs={'class':'entrmnt-wdgt-outer'}):
for a in div.findAll('li'):
print(a.text)


print("\n\t\t**** Latest News ****\t\n")
for div in soup.findAll('div', attrs={'id':'lateststories'}):
for a in div.findAll('li'):
print(a.text)