A content-based movie recommender system built with Flask, combining machine learning and interactive visualizations.
Inspired by Netflix, this web app offers personalized recommendations, trending titles, and direct trailer links β all in a simple, clean interface.
- π User Authentication: Sign up, log in, manage sessions securely.
- π― Personalized Recommendations: Suggests similar movies using NLP-based content filtering.
- π Trending Section: Highlights top-rated movies (rating > 9).
- π Visualizations: Language distribution & genre-wise average rating charts.
- π¬ Watch Trailers: One-click trailer links from YouTube.
- π Subscription Plans: Simulated subscription page with confirmation screen.
This project uses a content-based filtering approach to recommend movies. Hereβs an overview of the data flow and logic:
-
Raw movie data is stored in
dataset/movies_dataset.csv. -
clean_data.py:- Removes duplicates and incomplete records.
- Standardizes text fields (lowercase, trims spaces).
- Ensures all movies have valid ratings and image references.
-
Clean data is saved to
dataset/cleaned_movies.csv.
-
train.py:- Merges key metadata fields: tags, genre, actor, and language into a single text field.
- Uses CountVectorizer (from scikit-learn) to create vector embeddings of each movie.
- Computes cosine similarity between all movie vectors.
-
Saves:
pkl/movies.pkl: DataFrame of cleaned movies with metadata.pkl/similarity.pkl: Pre-computed similarity matrix.
-
app.py:- Handles routing, user sessions, and rendering templates.
- Supports login, signup, logout, recommendations, admin view, and data visualizations.
-
Recommendations are served instantly using the pre-computed similarity matrix.
-
User accounts stored in SQLite (
database/users.db). -
Created using
netflixdb.pyscript with fields:id(primary key)email(unique)password(plain text)
Netflix-App/
β
βββ scripts/
β βββ clean_data.py
β βββ netflixdb.py
β βββ train.py
β
βββ static/
β βββ css/
β β βββ login.css
β β βββ signup.css
β β βββ style.css
β β βββ subscription.css
β β βββ watch.css
β β
β βββ img/
β
βββ templates/
β βββ admin.html
β βββ home.html
β βββ login.html
β βββ payment_success.html
β βββ signup.html
β βββ subscription.html
β βββ visualize.html
β βββ watch.html
β
βββ dataset/
β βββ movies_dataset.csv
β βββ movie_links.csv
β βββ cleaned_movies.csv
β
βββ pkl/
β βββ movies.pkl
β βββ similarity.pkl
β
βββ database/
β βββ users.db
β
βββ app.py
| Script | Purpose |
|---|---|
clean_data.py |
Cleans raw data, standardizes text, removes duplicates, handles missing values |
train.py |
Generates vectors & similarity matrix using CountVectorizer & Cosine Similarity |
netflixdb.py |
Creates SQLite database for user login |
app.py |
Flask app: handles routing, sessions, recommendations, visualizations |
-
Visualizations (
/visualizeroute)- Pie Chart : Shows distribution of movies by language
- Bar Chart : Displays genres where the average rating exceeds 8
- Dynamically generated using
matplotliband embedded directly in the web interface
-
Watch trailers (
/watch/<movie>route)- Opens the trailer/watch link using pre-collected YouTube URLs stored in
movie_links.csv
- Opens the trailer/watch link using pre-collected YouTube URLs stored in
-
Subscription & Payment
/subscriptions: Displays sample subscription plans./payment_success: Confirms selected plan after form submission.
-
Admin View (
/admin)- Opens the login page. If logged in as admin, redirects to the homepage like a regular user.
When a user selects a movie:
- The app retrieves its index from the movies DataFrame.
- Then extracts its similarity scores from the matrix.
- Sorts other movies by descending similarity score.
- Recommends the top 5 most similar movies.
Recommendations include:
- Title
- Genre
- Language
- Rating
- Poster image (via
image_filecolumn)
- Python & Flask: Web framework & backend
- SQLite: Lightweight relational database
- Pandas: Data cleaning & processing
- scikit-learn: NLP vectorization & similarity calculation
- Matplotlib: Data visualization
- HTML: Templates
- pickle: Used to save and load preprocessed data (movie metadata & similarity matrix)
-
Navigate to your local Netflix-App folder, and open Command Prompt in that folder. Create a virtual environment :
python -m venv venv -
Activate the virtual environment :
venv\Scripts\activate -
Install required dependencies : flask, pandas, scikit-learn, matplotlib.
pip install flask pandas scikit-learn matplotlib -
Prepare the database :
python scripts/netflixdb.pyThis creates
users.dbto store usernames and passwords. -
Clean the data :
python scripts/clean_data.pyThis generates the
cleaned_movies.csvdataset. -
Train the Recommendation Model :
python scripts/train.pyThis generates
movies.pkl&similarity.pkl. -
Run the app :
python app.py -
Open your browser and go to :
http://127.0.0.1:5000
- Homepage :
- Recommendations Page :
- Visualizations Page :
- Sign-up Page :
- Sign-in Page :
- "Watch Now" Page :
- Subscriptions Page :
- "Payment Success" Page :
-
Replace plain-text password storage with secure hashing
-
Allow users to rate movies and track favorites or watchlists
-
Add search auto-suggestions with fuzzy matching for better UX
-
Integrate interactive visualizations using Plotly or Chart.js
-
Enable basic subscription logic (e.g., restrict features based on plan)
-
Improve UI with animations and responsive layouts
-
Add movie filtering by genre, language, or rating
Built by Avik β For learning, experimenting, and exploring Machine Learning with real-world web apps.
This is an open-source portfolio project. Feel free to use, modify, or extend it!







