Machine learning : Credit card Fraud detection project

At first we start i assume that you know python preety much it's recommended but not necessary . At  First we Are going to use this data for Credit card fraud detection .
https://www.kaggle.com/mlg-ulb/creditcardfraud  go to the link and download it .
Before we start fitting our model to dataset we do some data processing. To Know much about data, and get more idea about data, data processing is must .

1. Importing Necessary Libraries

To start, let's print out the version numbers of all the libraries we will be using in this project. This serves two purposes - it ensures we have installed the libraries correctly and ensures that this tutorial will be reproducible.
In [1]:
import sys
import numpy
import pandas
import matplotlib
import seaborn
import scipy

print('Python: {}'.format(sys.version))
print('Numpy: {}'.format(numpy.__version__))
print('Pandas: {}'.format(pandas.__version__))
print('Matplotlib: {}'.format(matplotlib.__version__))
print('Seaborn: {}'.format(seaborn.__version__))
print('Scipy: {}'.format(scipy.__version__))
Python: 2.7.13 |Continuum Analytics, Inc.| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)]
Numpy: 1.14.0
Pandas: 0.21.0
Matplotlib: 2.1.0
Seaborn: 0.8.1

# import the necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



Now we are going to import our data set from its location . 

# Load the dataset from the csv file using pandas
data = pd.read_csv('creditcard.csv')
we are going to use pandas for reading this Csv file now using pandas.
now process the data to know more about data . 

print(data.columns)
data = data.sample(frac=0.1, random_state = 1)
print(data.shape)
print(data.describe())

# Plot histograms of each parameter 
data.hist(figsize = (20, 20))
plt.show()


Now set our target which is class we have to find out whether it's a fraudential
transaction or not , so
# Get all the columns from the dataFrame
columns = data.columns.tolist()

# Filter the columns to remove data we do not want
columns = [c for c in columns if c not in ["Class"]]

# Store the variable we'll be predicting on
target = "Class"

X = data[columns]
Y = data[target]

# Print shapes
print(X.shape)
print(Y.shape)columns
Now fit the model using ML algorithms ,
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

# define random states
state = 1

# define outlier detection tools to be compared
classifiers = {
    "Isolation Forest": IsolationForest(max_samples=len(X),
                                        contamination=outlier_fraction,
                                        random_state=state),
    "Local Outlier Factor": LocalOutlierFactor(
        n_neighbors=20,
        contamination=outlier_fraction)}classification_report
# Fit the model
plt.figure(figsize=(9, 7))
n_outliers = len(Fraud)
for i, (clf_name, clf) in enumerate(classifiers.items()):
    
    # fit the data and tag outliers
    if clf_name == "Local Outlier Factor":
        y_pred = clf.fit_predict(X)
        scores_pred = clf.negative_outlier_factor_
    else:
        clf.fit(X)
        scores_pred = clf.decision_function(X)
        y_pred = clf.predict(X)
    
    # Reshape the prediction values to 0 for valid, 1 for fraud. 
    y_pred[y_pred == 1] = 0
    y_pred[y_pred == -1] = 1
    
    n_errors = (y_pred != Y).sum()
    
    # Run classification metrics
    print('{}: {}'.format(clf_name, n_errors))
    print(accuracy_score(Y, y_pred))
    print(classification_report(Y, y_pred))
Results :
Local Outlier Factor: 97
0.9965942207085425
             precision    recall  f1-score   support

          0       1.00      1.00      1.00     28432
          1       0.02      0.02      0.02        49

avg / total       1.00      1.00      1.00     28481

Isolation Forest: 71
0.99750711000316
             precision    recall  f1-score   support

          0       1.00      1.00      1.00     28432
          1       0.28      0.29      0.28        49

avg / total       1.00      1.00      1.00     28481
If we analyze the results we see that Isolation Forest has high accuracy
score than Outlier Fraction . If we see precision of both algo in detecting
fraudential transaction then we found that Isolation Forest has a precision
of 0.28 not a good one but better than local outlier fraction ,so use other
models train it and increase the precision of detecting a fraudential transaction.
😊thank you
@ mazerunner

Comments

This is an excellent post I seen thanks to share it. It is really what I wanted to see hope in future you will continue for sharing such a excellent post. omni channel call center services in Nigeria

Popular posts from this blog

Amazon Web Services

Hacker Rank all java and python problem solutions

Google Code-In mentorship experience :)