Skip to content

本项目基于Spark和Python,对全球电影数据进行类型、地区、导演演员及用户评分等多维度分析与可视化。

Notifications You must be signed in to change notification settings

StarryCode-Lang/Movie-Analytics-with-Spark-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movie Data Analysis & Visualization

This project provides a comprehensive analysis of global movie data, focusing on production trends, genre statistics, regional differences, director and actor influence, and user rating behaviors. The analysis is performed using Apache Spark (Scala) for large-scale data processing and Python (Jupyter Notebook) for visualization.

Features

  • Production Trend Analysis: Explore the yearly trend of movie releases from 1873 to 2019.
  • Genre Analysis: Statistical analysis of movie genres, including rating distribution, popularity, and market share.
  • Regional Analysis: Compare movie ratings and production across different regions and analyze regional collaboration effects.
  • Director & Actor Analysis: Evaluate the influence of directors and actors based on average ratings, productivity, and collaboration.
  • User Rating Behavior: Analyze user activity, rating preferences, and consistency.
  • Data Visualization: Interactive charts and visual reports using Python and pyecharts.

Directory Structure

  • data/ : Raw data, Scala analysis scripts, and output CSVs.
  • visiual.ipynb : Jupyter Notebook for data visualization and further analysis.
  • 图表/ : Saved charts and visualizations.
  • 16region_rating_analysis.html : Example HTML visualization output.

How to Use

  1. Data Preparation: Place raw movie and rating data in the data/ directory.
  2. Run Analysis: Use the Scala scripts in data/ with Apache Spark to generate analysis results (CSV files).
  3. Visualization: Open visiual.ipynb in Jupyter Notebook to visualize and further analyze the results.
  4. View Charts: Check the 图表/ directory and HTML files for exported visualizations.

Requirements

  • Apache Spark
  • Scala
  • Python 3.x
  • Jupyter Notebook
  • pyecharts, pandas

License

This project is for academic and research purposes.

About

本项目基于Spark和Python,对全球电影数据进行类型、地区、导演演员及用户评分等多维度分析与可视化。

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published