This project provides a comprehensive analysis of global movie data, focusing on production trends, genre statistics, regional differences, director and actor influence, and user rating behaviors. The analysis is performed using Apache Spark (Scala) for large-scale data processing and Python (Jupyter Notebook) for visualization.
- Production Trend Analysis: Explore the yearly trend of movie releases from 1873 to 2019.
- Genre Analysis: Statistical analysis of movie genres, including rating distribution, popularity, and market share.
- Regional Analysis: Compare movie ratings and production across different regions and analyze regional collaboration effects.
- Director & Actor Analysis: Evaluate the influence of directors and actors based on average ratings, productivity, and collaboration.
- User Rating Behavior: Analyze user activity, rating preferences, and consistency.
- Data Visualization: Interactive charts and visual reports using Python and pyecharts.
data/
: Raw data, Scala analysis scripts, and output CSVs.visiual.ipynb
: Jupyter Notebook for data visualization and further analysis.图表/
: Saved charts and visualizations.16region_rating_analysis.html
: Example HTML visualization output.
- Data Preparation: Place raw movie and rating data in the
data/
directory. - Run Analysis: Use the Scala scripts in
data/
with Apache Spark to generate analysis results (CSV files). - Visualization: Open
visiual.ipynb
in Jupyter Notebook to visualize and further analyze the results. - View Charts: Check the
图表/
directory and HTML files for exported visualizations.
- Apache Spark
- Scala
- Python 3.x
- Jupyter Notebook
- pyecharts, pandas
This project is for academic and research purposes.