📖 Overview

This Jupyter Notebook uses the python programming language to perform web scraping on Wikipedia pages, by collecting information present in the infobox of a given page. In the present code, the main table of Disney films on Wikipedia was used: “https://en.wikipedia.org/wiki/List_of_Walt_Disney_Pictures_films”, in which the link to the page of each film on Wikipedia was extracted, and later each one these links are accessed by extracting the infobox and adding their information to the DataFrame. In another step, the critical notes are extracted for each movie in the DataFrame through the use of the OMDB (open movie database) API.

📄 Files

main.ipynb: Main Jupyter Notebook used to perform web scraping;
get_imdb_note.ipynb: Jupyter Notebook used to get imdb note;
help_functions.py: Python Script that contains help functions used by main.ipynb and get_imdb_note.ipynb.

📦 Dependencies

bs4
datetime
json
pandas
pickle
requests

💻 Usage

To use this project it is necessary to have a OMDb API account, the registration can be done for free here;
Obtain the authentication keys for connecting to the OMDb API account;
Store the authentication keys in the Python Script help_functions.py;
Install the dependencies;
Run Jupyter Notebook in terminal to see the code in your browser.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
dataframe_disney.json		dataframe_disney.json
dataframe_disney_cleaned.pickle		dataframe_disney_cleaned.pickle
dataframe_disney_imdb_note.csv		dataframe_disney_imdb_note.csv
get_imdb_note.ipynb		get_imdb_note.ipynb
help_functions.py		help_functions.py
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 Overview

📄 Files

📦 Dependencies

💻 Usage

About

Uh oh!

Releases

Packages

Languages

License

iamgonzalez/Web-Scraping-on-Wikipedia-Pages

Folders and files

Latest commit

History

Repository files navigation

📖 Overview

📄 Files

📦 Dependencies

💻 Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages