Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 2.53 KB

README.md

File metadata and controls

30 lines (19 loc) · 2.53 KB

Fraud-Detection

Description

This project aims at detecting fraud in credit card transactions. The data and and problem statement is taken from Kaggle. https://www.kaggle.com/mlg-ulb/creditcardfraud

Algorithm Used

The dataset has 30 independent variables and 1 dependent variable. It is highly skewed and any machine learning model that randomly predicts a transaction to be non-fraudulent would give 99% accuracy.

Hence, common machine learning classification algorithms such as logistic regression would not work. We used multivariate gaussian algorithm to detect the anomalies in the data. The expression for univariate Gaussian is given by :

univariate gaussian

norm function in Anomaly Detection.py is defined to calculate the univariate Gaussian for any feature.

For multivariate Gaussian, univariate gaussian probabilities for all the features are calculated and multiplied together. This product is a multivariate gaussian distribution and can be expresses as : Multivariate Gaussian

We use confusion matrix to measure the efficiency of the model. For a fraud detection system , it should be able to capture maximum number of True Positive cases and it must avoid False Negatives. We measure the perforance of model by calculating the Recall and Precision of the model for arious threshold values of p. Transactions whose p value is less than the threshold would be considered as an anomaly or a fraudulent transaction.

Usage

The code can be used for any data anomaly detection. You needs to modify the dimensions of dataset as per your dataset and use an appropriate value of threshold probability below which signifies the anomaly behaviour.

References

https://www.coursera.org/lecture/machine-learning/multivariate-gaussian-distribution-Cf8DF

https://www.kaggle.com/mlg-ulb/creditcardfraud

http://cs229.stanford.edu/section/gaussians.pdf