Python: 2.7.13 |Continuum Analytics, Inc.| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)]
Numpy: 1.14.0
Pandas: 0.21.0
Matplotlib: 2.1.0
Seaborn: 0.8.1
# import the necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Now we are going to import our data set from its location .
# Load the dataset from the csv file using pandas
data = pd.read_csv('creditcard.csv')
we are going to use pandas for reading this Csv file now using pandas.
now process the data to know more about data .
print(data.columns)
data = data.sample(frac=0.1, random_state = 1)
print(data.shape)
print(data.describe())
Now set our target which is class we have to find out whether it's a fraudential
transaction or not , so
# Get all the columns from the dataFrame
columns = data.columns.tolist()
# Filter the columns to remove data we do not want
columns = [c for c in columns if c not in ["Class"]]
# Store the variable we'll be predicting on
target = "Class"
X = data[columns]
Y = data[target]
# Print shapes
print(X.shape)
print(Y.shape)columns
Now fit the model using ML algorithms ,
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
# define random states
state = 1
# define outlier detection tools to be compared
classifiers = {
"Isolation Forest": IsolationForest(max_samples=len(X),
contamination=outlier_fraction,
random_state=state),
"Local Outlier Factor": LocalOutlierFactor(
n_neighbors=20,
contamination=outlier_fraction)}classification_report
# Fit the model
plt.figure(figsize=(9, 7))
n_outliers = len(Fraud)
for i, (clf_name, clf) in enumerate(classifiers.items()):
# fit the data and tag outliers
if clf_name == "Local Outlier Factor":
y_pred = clf.fit_predict(X)
scores_pred = clf.negative_outlier_factor_
else:
clf.fit(X)
scores_pred = clf.decision_function(X)
y_pred = clf.predict(X)
# Reshape the prediction values to 0 for valid, 1 for fraud.
y_pred[y_pred == 1] = 0
y_pred[y_pred == -1] = 1
n_errors = (y_pred != Y).sum()
# Run classification metrics
print('{}: {}'.format(clf_name, n_errors))
print(accuracy_score(Y, y_pred))
print(classification_report(Y, y_pred))
Results :
Local Outlier Factor: 97
0.9965942207085425
precision recall f1-score support
0 1.00 1.00 1.00 28432
1 0.02 0.02 0.02 49
avg / total 1.00 1.00 1.00 28481
Isolation Forest: 71
0.99750711000316
precision recall f1-score support
0 1.00 1.00 1.00 28432
1 0.28 0.29 0.28 49
avg / total 1.00 1.00 1.00 28481
If we analyze the results we see that Isolation Forest has high accuracy
score than Outlier Fraction . If we see precision of both algo in detecting
fraudential transaction then we found that Isolation Forest has a precision
of 0.28 not a good one but better than local outlier fraction ,so use other
models train it and increase the precision of detecting a fraudential transaction.
😊thank you
@ mazerunner
Comments
Django Online Courses
Django Training in Hyderabad
Python Django Online Training
Python Django Training in Hyderabad