Machine learning : Credit card Fraud detection project

- July 08, 2019

At first we start i assume that you know python preety much it's recommended but not necessary . At First we Are going to use this data for Credit card fraud detection .
https://www.kaggle.com/mlg-ulb/creditcardfraud go to the link and download it .
Before we start fitting our model to dataset we do some data processing. To Know much about data, and get more idea about data, data processing is must .

1. Importing Necessary Libraries

To start, let's print out the version numbers of all the libraries we will be using in this project. This serves two purposes - it ensures we have installed the libraries correctly and ensures that this tutorial will be reproducible.

In [1]:

import sys
import numpy
import pandas
import matplotlib
import seaborn
import scipy

print('Python: {}'.format(sys.version))
print('Numpy: {}'.format(numpy.__version__))
print('Pandas: {}'.format(pandas.__version__))
print('Matplotlib: {}'.format(matplotlib.__version__))
print('Seaborn: {}'.format(seaborn.__version__))
print('Scipy: {}'.format(scipy.__version__))

Python: 2.7.13 |Continuum Analytics, Inc.| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)]
Numpy: 1.14.0
Pandas: 0.21.0
Matplotlib: 2.1.0
Seaborn: 0.8.1

# import the necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Now we are going to import our data set from its location .

# Load the dataset from the csv file using pandas
data = pd.read_csv('creditcard.csv')
we are going to use pandas for reading this Csv file now using pandas.
now process the data to know more about data . 


print(data.columns)
data = data.sample(frac=0.1, random_state = 1)
print(data.shape)
print(data.describe())






# Plot histograms of each parameter 
data.hist(figsize = (20, 20))
plt.show()


















Now set our target which is class we have to find out whether it's a fraudential
transaction or not , so
# Get all the columns from the dataFrame
columns = data.columns.tolist()

# Filter the columns to remove data we do not want
columns = [c for c in columns if c not in ["Class"]]

# Store the variable we'll be predicting on
target = "Class"

X = data[columns]
Y = data[target]

# Print shapes
print(X.shape)

print(Y.shape)columns
Now fit the model using ML algorithms ,
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

# define random states
state = 1

# define outlier detection tools to be compared
classifiers = {
    "Isolation Forest": IsolationForest(max_samples=len(X),
                                        contamination=outlier_fraction,
                                        random_state=state),
    "Local Outlier Factor": LocalOutlierFactor(
        n_neighbors=20,

        contamination=outlier_fraction)}classification_report

# Fit the model
plt.figure(figsize=(9, 7))
n_outliers = len(Fraud)
for i, (clf_name, clf) in enumerate(classifiers.items()):
    
    # fit the data and tag outliers
    if clf_name == "Local Outlier Factor":
        y_pred = clf.fit_predict(X)
        scores_pred = clf.negative_outlier_factor_
    else:
        clf.fit(X)
        scores_pred = clf.decision_function(X)
        y_pred = clf.predict(X)
    
    # Reshape the prediction values to 0 for valid, 1 for fraud. 
    y_pred[y_pred == 1] = 0
    y_pred[y_pred == -1] = 1
    
    n_errors = (y_pred != Y).sum()
    
    # Run classification metrics
    print('{}: {}'.format(clf_name, n_errors))
    print(accuracy_score(Y, y_pred))
    print(classification_report(Y, y_pred))
Results :
Local Outlier Factor: 97
0.9965942207085425
             precision    recall  f1-score   support

          0       1.00      1.00      1.00     28432
          1       0.02      0.02      0.02        49

avg / total       1.00      1.00      1.00     28481

Isolation Forest: 71
0.99750711000316
             precision    recall  f1-score   support

          0       1.00      1.00      1.00     28432
          1       0.28      0.29      0.28        49

avg / total       1.00      1.00      1.00     28481




If we analyze the results we see that Isolation Forest has high accuracy 

score than Outlier Fraction . If we see precision of both algo in detecting 

fraudential transaction then we found that Isolation Forest has a precision

of 0.28 not a good one but better than local outlier fraction ,so use other

models train it and increase the precision of detecting a fraudential transaction.



😊thank you

@ mazerunner

Comments

Bhanu Sree said…

It is very helpful and very interesting and informative Blog..
Django Online Courses
Django Training in Hyderabad
Python Django Online Training
Python Django Training in Hyderabad

Tuesday, March 31, 2020 at 11:18:00 AM GMT+5:30

omni channel call center services in Nigeria said…

This is an excellent post I seen thanks to share it. It is really what I wanted to see hope in future you will continue for sharing such a excellent post. omni channel call center services in Nigeria

Wednesday, October 13, 2021 at 1:51:00 PM GMT+5:30

Search This Blog

program_X

Machine learning : Credit card Fraud detection project

1. Importing Necessary Libraries

Comments

Popular posts from this blog

Amazon Web Services

Hacker Rank all java and python problem solutions

Testing tools for React Apps