📚 Book Recommendation Engine using K-Nearest Neighbors

A machine learning-based book recommendation system that uses collaborative filtering and K-Nearest Neighbors (KNN) algorithm to suggest similar books based on user ratings.

Quick Overview

Problem: Recommend relevant books using large-scale, sparse user rating data, where traditional rule-based methods fail to capture user preference patterns.

Solution: Built a collaborative filtering recommendation engine using K-Nearest Neighbors with cosine distance, leveraging a user–book rating matrix and sparse representations to identify similar books based on shared rating behavior.

Impact: Successfully generated meaningful book recommendations with similarity scores using 1.1 million ratings, demonstrating applied knowledge of recommender systems, distance-based learning, and data preprocessing for real-world scale datasets.

Completed Project:

https://colab.research.google.com/drive/1t8mqNEZ9czLAun3leolBdjZPJhWmgTfl?usp=drive_link

🔧 Technologies Used

Python 3.x
NumPy - Numerical computations
Pandas - Data manipulation and analysis
Scikit-learn - Machine learning (KNN algorithm)
SciPy - Sparse matrix operations
Matplotlib - Data visualization (optional)

📊 Dataset

Book-Crossings Dataset:

1.1 million ratings (scale 1-10)
270,000 books
90,000 users

Source: The dataset is automatically downloaded in the notebook from FreeCodeCamp.

🚀 How It Works

1. Data Preprocessing

Load book and rating data from CSV files
Filter out sparse data:
- Remove users with fewer than 200 ratings
- Remove books with fewer than 100 ratings
This ensures statistical significance in recommendations

2. Create User-Book Matrix

                User1  User2  User3  User4  ...
Book A            5      0      4      5    ...
Book B            0      3      0      4    ...
Book C            4      5      3      0    ...

3. Train KNN Model

Uses cosine distance metric to measure similarity
Finds the 5 nearest neighbors (most similar books)
Algorithm: Brute force (most accurate for high-dimensional data)

4. Generate Recommendations

The system compares rating patterns (book "fingerprints") to find similar books.

💻 Usage

Basic Function Call

get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")

Expected Output

[
  "Where the Heart Is (Oprah's Book Club (Paperback))",
  [
    ["I'll Be Seeing You", 0.8],
    ['The Weight of Water', 0.77],
    ['The Surgeon', 0.77],
    ['I Know This Much Is True', 0.77],
    ['The Lovely Bones: A Novel', 0.72]
  ]
]

Output Format:

First element: Input book title
Second element: List of 5 recommended books with their distances
- Lower distance = More similar books
- Distance ranges from 0 (identical) to 1 (completely different)

🧮 Algorithm Details

K-Nearest Neighbors (KNN)

Algorithm Type: Lazy learning (instance-based)
Distance Metric: Cosine distance
K Value: 6 (returns 6 neighbors, skip first as it's the input book itself)
Search Method: Brute force

Why Cosine Distance?

Cosine distance measures the angle between rating vectors, making it ideal for comparing user preferences regardless of rating scale differences.

Distance = 1 - (A · B) / (||A|| × ||B||)

📈 Key Features

✅ Collaborative filtering based on user ratings
✅ Handles sparse data efficiently using sparse matrices
✅ Statistical significance through data filtering
✅ Fast recommendations using optimized KNN
✅ Returns books with similarity scores

🔍 Understanding the Results

Distance Interpretation:

0.0 - 0.3: Very similar books
0.3 - 0.6: Moderately similar books
0.6 - 0.8: Somewhat similar books
0.8 - 1.0: Different books

Lower distances indicate stronger recommendations!

📝 Code Structure

├── Data Loading
│   ├── Download dataset
│   └── Load CSV files into DataFrames
│
├── Data Cleaning
│   ├── Filter users (>= 200 ratings)
│   └── Filter books (>= 100 ratings)
│
├── Matrix Creation
│   ├── Pivot table (books × users)
│   └── Convert to sparse matrix
│
├── Model Training
│   └── Fit KNN model
│
└── Recommendation Function
    ├── Find book in matrix
    ├── Get k-nearest neighbors
    └── Return formatted results

🧪 Testing

The notebook includes a test function that validates:

Correct book title returned
5 recommendations provided
Recommended books match expected titles
Distance values within acceptable range (±0.05)

test_book_recommendation()
# Output: "You passed the challenge! 🎉🎉🎉🎉🎉"

📚 Example Recommendations

Input: "The Queen of the Damned (Vampire Chronicles (Paperback))"

Output:

Catch 22 (0.79)
The Witching Hour (0.74)
Interview with the Vampire (0.73)
The Tale of the Body Thief (0.54)
The Vampire Lestat (0.52)

The system successfully identifies other books in the Vampire Chronicles series and similar fiction!

🎓 Learning Outcomes

This project demonstrates:

Collaborative Filtering: Recommending items based on similar user preferences
Dimensionality Reduction: Filtering sparse data for better performance
Distance Metrics: Using cosine similarity for recommendation systems
Data Preprocessing: Handling real-world messy data
Matrix Operations: Working with sparse matrices efficiently

🔗 Resources

👤 Author

Created as part of the FreeCodeCamp Machine Learning with Python certification.

📄 License

This project is open source and available for educational purposes.

Note: This is a learning project demonstrating collaborative filtering and KNN algorithms for recommendation systems. For production use, consider additional optimizations and error handling.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
book_recommendation_knn.ipynb		book_recommendation_knn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 Book Recommendation Engine using K-Nearest Neighbors

Quick Overview

🔧 Technologies Used

📊 Dataset

🚀 How It Works

1. Data Preprocessing

2. Create User-Book Matrix

3. Train KNN Model

4. Generate Recommendations

💻 Usage

Basic Function Call

Expected Output

🧮 Algorithm Details

K-Nearest Neighbors (KNN)

Why Cosine Distance?

📈 Key Features

🔍 Understanding the Results

📝 Code Structure

🧪 Testing

📚 Example Recommendations

🎓 Learning Outcomes

🔗 Resources

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 Book Recommendation Engine using K-Nearest Neighbors

Quick Overview

🔧 Technologies Used

📊 Dataset

🚀 How It Works

1. Data Preprocessing

2. Create User-Book Matrix

3. Train KNN Model

4. Generate Recommendations

💻 Usage

Basic Function Call

Expected Output

🧮 Algorithm Details

K-Nearest Neighbors (KNN)

Why Cosine Distance?

📈 Key Features

🔍 Understanding the Results

📝 Code Structure

🧪 Testing

📚 Example Recommendations

🎓 Learning Outcomes

🔗 Resources

👤 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages