Introduction

The report defines the implementation of fraud detection in credit cards. It highlights the implementation of the multiple checking process to understand those factors which are related to “Credit Card Fraud”. The detection process involves multiple steps of evaluation which defines the process of execution with various functionalities. The overall process highlights the use of Python coding which assists in creating and implementing predictive models. The models are the basic factor of the fraud evaluation which defines the use of the collected dataset. The dataset highlights the implementation of the functional parameters which supports the evaluation of the model performances. It highlights the accuracy and multiple relevant factors of the model. The loss functionality of the model highlights the implementation of MSE (“Mean Square Error”).

Phases of the project

The Python libraries highlight the use of the functional factors that assist in the overall examination process. This defines the use of the necessary factors that support the use of the ‘warnings’ library which is used to remove the system-generated warning messages. The use of the ‘numpy’ defines the calculative approaches of the investigation process (Alarfaj et al., 2022). On the other hand, the “pandas” is one of the primary libraries that is used to read the data, and use the data in the main functional area. The graphical representation libraries are “plotly”, “matplotlib”, and “seaborn”. All those libraries define multiple functionalities.

The construction of a “data frame” (DF) is demonstrated in the above section which highlights the implementation of an important Python library and method, “pd.read_csv”. The method is used to investigate the collected data. It helps to find the attributes of the data and find the necessary factors of the data. This data highlights the details of “Credit Card Fraud” (Alghofaili et al., 2020). The parameters of the dataset highlight the ‘time of transaction’, ‘type of card’, ‘type of transaction’, ‘shipping address’, ‘age’, ‘bank’, ‘gender’, and ‘Fraud’. Different attributes have different functional parameters that support the valuable aspects of the research process.

The details of the DF define the name of the column that is present in the dataset. This also defines the number of non-null values that are present in the dataset. It also defines the types of data which highlight three major values such as “int64”, “float64”, and “object”. This information also defines the memory usage and numbers of each type. As per the investigation, there are 2 integer type data, 1 float type data, and 13 object type data presents in the dataset.

The cleaning process emphasizes the checking of the “NULL” or “Blank” value in the dataset. It assists in identifying the column and removing the blank part from the main data. According to the investigation, there are 6 “null” values present in the ‘Amount’ column, 10 “null” values present in the ‘Merchant Group’ column, 5 “null” values present in the ‘Shipping Address’ column, and 4 “null” values present in the ‘Gender’ column.

The ‘fill null’ method in Python is used to replace the null place with the ‘0’ input data. The functionality defines the importance of the use of the clean method. The procedure also interprets the practicality of the cleaning procedure in coding and the study of data (Dzakiyullah et al., 2021). The functionality helps to understand the missing parameters or the values that reduce the effectiveness of the data

The “drop” is a functional procedure that assist to remove the unnecessary portion of the DF. In this coding section, the “Amount” column defines the implementation of the ‘£’ which needs to be eliminated from the data. The major functionality defines the removal of the ‘£’ symbol, and converts the column, ‘Amount’ into ‘integer’ (Faraji, 2022). The overall analysis defines the conversion of string-value data columns into integer-value data columns. The area of functionalities defines the process of conversion between investigation processes.

The new data information highlights the use of an informative process to find out the new structure of the DF. It defines the details of the new DF which highlights the change in the null value count. This process is found by using the use of the cleaning process of the data. The method defines multiple steps for cleaning the data and filters out the necessary values of the data. It defines the new structure of the data with no “null” values.

The new data structure defines the use of the cleaning method that supports the cleaning of the data and conversion of the string or object data into its equivalent integer data columns. This part highlights the load and transformation of the collected data. It also highlights the findings of the ambiguities such as ‘Amount’, ‘Merchant Group’, and so on (Chang et al., 2022). In this investigation, no assumption is needed. The ‘anomalies’ of this dataset highlight the major factor of this data which is ‘Fraud’. This column contains ‘0’, and ‘1’ which define genuine, and fraud values.

Exploratory Data Analysis (EDA)

The method of statistical investigation defines the finding of the statical parameters and factors such as “std, mean, count, min, percentage, and max”. This process is only implemented on the numeric data columns of the dataset. It defines multiple value-changing factors that assist in the further investigation process (Gupta et al., 2023). The method defines the process of data collection and implementation of the mathematical calculation process. It highlights the use of the “describe()” method which is a predefined method in Python for finding the statistical parameters of the dataset.

The histogram of the integer data columns is defined in the above section. It highlights multiple integer column data such as “Time”, “Amount”, “Age”, and “Fraud”. The histogram plot is used to count the total number of each numeric attribute which are present in the dataset. It defines the valuable contribution of the EDA analysis.

The histogram plot of the amount with respect to the merchant group defines the amount of fraud in the system. Multiple merchant groups are highlighted in the legend section of the graph. It is an important visualization method for the demonstration of the data relationship.

The total amount of fraud with respect to gender is demonstrated in the above donut chart. It highlights the use of the percentage calculation to define which gender faces the maximum amount of fraud. As per the investigation, the ‘Male’ candidate faces the maximum fraud case which is 63.3% of the overall fraud (Wu, and Wang, 2021). On the other hand, the ‘Female’ candidate faces the minimum fraud case which is 36.4%. This defines the visualization of the relationship between fraud cases and gender.

The credit card-wise amount of fraud is demonstrated in the above bar plot. The plot highlights two major axes such as ‘x’ where ‘Type of Card’ is situated, and ‘y’ where ‘Amount’ of fraud is present (Sadgali et al., 2019). According to the investigation, the ‘Visa’ card type, faces the maximum amount of fraud cases. This highlights that the user must avoid using the ‘Visa’ card.The total amount of fraud with respect to entry mode is demonstrated in the above pie chart. It highlights three different types of entry modes such as “CVC”, “PIN”, and “Tap”. Each entry mode has different fraud values. As per the investigation, ‘CVC’ faces the maximum amount of fraud which is 47.9%. Whereas the ‘PIN’ type entry mode faces the second maximum amount of fraud which is 45.2%. On the other hand, the ‘Tap’ section faces the lowest amount of fraud values which is 6.92% of the overall fraud cases.

The total amount of fraud with respect to the types of transactions is demonstrated in the above donut chart. It highlights the use of the percentage calculation to define which transaction method faces the maximum amount of fraud. There are three major types of transactions present in the system such as “Online”, “POS”, and “ATM” (Singh, and Jain, 2019). As per the investigation, the ‘Online’ transaction type, faces the maximum fraud case which is 47.9% of the overall fraud. On the other hand, the ‘POS’ transaction type, faces the second maximum fraud case which is 30.5%. Whereas the ‘ATM’ transaction type, faces the minimum fraud cases which is 21.6%.

The above line chart defines the amount of fraud with respect to the banks. According to the investigation, the bank ‘Barclays’ faces maximum fraud cases. On the other hand, ‘HSBC’, ‘RBS’, and ‘Lloyds’ faces the minimum number of fraud cases. This assists in finding the riskier bank where users can face the fraud of money.

The country-wise fraud transaction is demonstrated in the above code. This process highlights the involvement of the finding of the total amount of fraud with respect to the country of transaction. According to the data, there are five countries of transaction present in which the USA has the maximum number of fraudulent transactions (Plakandaras et al., 2022). This visualization helps to understand which country has the maximum fraudulent transaction. It helps to take the necessary steps to reduce the fraudulent transaction in that country.

The total amount of fraud with respect to the country is demonstrated in the above donut chart. This highlights the maximum as well as minimum amount of fraud present in multiple countries. As per the data, five major countries are detected where fraud cases are found such as the UK, India, Russia, the USA, and China (Al Rubaie, 2021). According to the data of investigation, the maximum fraud amount found in the country is the USA which is 113710.

The total fraud and genuine transaction count process is used to count the amount of fraud present in the system. Here, the ‘value_counts’ method is used to count the total number of frauds for fraud and genuine cases.

The graphical representation of the amount of fraud and genuine cases is demonstrated in the above graph. It defines the relation between ‘0’, and ‘1’, where ‘0’ highlights the fraud cases, and ‘1’ highlights the genuine cases (Ji, 2021). As per the investigation, there is a smaller number of fraud cases present in the system which is 7.2%. On the other hand, the percentage of genuine cases for this investigation is 92.8%.

Analytical models

The model analysis part highlights the process of pre-processing which is used to convert necessary string data into integer data. As per the investigation, the gender column has two string values which are ‘M’, and ‘F’. This column data is converted into its equivalent integer-value data which is ‘M’ -> 1, and ‘F’ -> 2. The new DF after conversion of the data is defined in the below section of the above code.

The cleaning method is the part of the model creation process that helps to drop the unnecessary columns, and its data, and create a new DF. The new DF has four columns such as “Time”, “Amount”, “Gender”, “Age”, and “Fraud”. This new DF is used in the next process of investigation. This involves only numeric data that supports the creation of the model.

The setting of the value of two major variables such as ‘x’ as well as ‘y’. The “x” axis defines the use of the set of the categorical value of the data which are “Time”, “Amount”, “Gender”, and “Age”. On the other hand, the “y” axis defines the use of the set of the feature value of the data which is “Fraud”. Those factors are implemented in the next investigation procedure. The values are set with the help of the new DF values.

The procedure of split demonstrates the breaking of the allover new “DF” into four points such as “x_train, x_test, y_train, and y_test”. The ‘train_test_split’ module is used to divide or breakdown the data and make those four sections (Vivek et al., 2023). All those sections are implemented in the next investigation procedure. This emphasizes the procedure of examination which assist in the allover execution process. The procedure of examination emphasizes the use of the “train_test_split()” procedure that is the basic part of the creation of the model.

The modeling of the “Decision Tree Classifier” (DT) is defined in the above section. It emphesizes the use of the DT modules that is defined as “DecisionTreeClassifier”. The other modules are used to find different parameters of the investigation. The DT model for this examination is “decision_model”. The number of sample states that is used in the process, is 40 (Ojugo, and Nwankwo, 2021). The “fit()” procedure is implemented to fitting of the DT model with the using of “x_train, and y_train”. On the other hand, the “predict()” procedure is implemented to predict the measurement of the fraud by the model, DT, with the assisment of ‘x_test’ data.

The configuration of the “AdaBoost Classifier” is emphasizes in the above section. It emphasizes the implementation of the AdaBoost modules that is define as “AdaBoostClassifier”. The other libraries are involved to find various parameters of the examination. The name of the AdaBoost model for this investigation is “ada_boost_model”. This constructional model emphesizes the implementation of the base model which is, the “Decision Tree” in this case. The number of random sample states is 40, and the estimator value is 45 for the creation of the AdaBoost model (Mijwil, and Salem, 2020). The “fit()” method is used to fitting of the AdaBoost model with the help of “x_train, and y_train” data. On the other hand, the “predict()” method is used to predict the value of the fraud by the AdaBoost model with the help of ‘x_test’ data.

Critical evaluation of each model

The evaluation of the model highlights the finding of the accuracy value of the DT model. It highlights the use of the “accuracy_score()” method which is implemented with the help of the accuracy library. This implements the use of ‘y_test’, and DT prediction value data. As per the data, the accuracy value of DT is 0.93.

The report of DT highlights the accuracy and other parametric values of the model such as “precision”, “recall”, and “f1-score”. As per the data, the “precision”, “recall”, and “f1-score” values of ‘0’ are higher than ‘1’ which are defined as fraud cases and are less predictive than the genuine case. The “precision”, “recall”, and “f1-score” values of ‘0’ are 0.96, 0.97, and 0.96

The loss functionality calculation of the DT highlights the implementation of “MSE (Mean Squared Error)”. As per the investigation, the value of MSE of DT is 0.065

The evaluation of the model highlights the finding of the accuracy value of the AdaBoost model. It highlights the use of the “accuracy_score()” method which is implemented with the help of the accuracy library (Karthik et al., 2022). This implements the use of ‘y_test’, and AdaBoost prediction value data. As per the data, the accuracy value of AdaBoost is 0.96.

The report of AdaBoost highlights the accuracy and other parametric values of the model such as “precision”, “recall”, and “f1-score”. As per the data, the “precision”, “recall”, and “f1-score” values of ‘0’ are higher than ‘1’ which are defined as fraud cases and are less predictive than the genuine case. The “precision”, “recall”, and “f1-score” values of ‘0’ are 0.96, 1.00, and 0.98.

The loss functionality calculation of AdaBoost highlights the implementation of “MSE (Mean Squared Error)”. As per the investigation, the value of MSE of DT is 0.038.

Results and findings

This portion of the examination defines the investigated output of the allover analysis. It emphasizes the “correlation matrix” between three components of this investigated data such as ‘Fraud’, ‘Amount’, and ‘Time’. This presented graph emphasizes the implementation of the “heatmap” for the supporting the “Correlation Matrix”.

The above picture emphasizes the use of the “Confusion Matrix” that assists to obtain the structure of the “Decision Tree Classifier”. This measuring part can be examined with the assessment of ‘y_test’, and the predictive measuring factor of the model of the “Decision Tree Classifier”

The above picture emphasizes the use of the “Confusion Matrix” that assists to obtain the structure of the “AdaBoost Classifier”. This measuring part can be examined with the assessment of ‘y_test’, and the predictive measuring factor of the model of the “AdaBoost Classifier”

Conclusion

The overall examination procedure defines the method of data collection, loading of data, processing of data, and finding of suitable parameters of the data. It emphasizes the use of Python coding which helps to understand the relation between multiple factors of the “CreditCard” data. The processing of the data highlights the cleaning, sorting, and removal of unused data. It defines the involvement of the predictive model that assists in predicting the risk level of “Credit Card” fraud by multiple factors such as gender, type of card, type of transaction, and so on. Two models are used to find out the predictive value of “Credit Card Fraud” which helps to detect the fraud transaction area of the system. As per the investigation, the accuracy level in the genuine case is higher than in the fraud case. So, it concludes that the risk level of “Credit Card Fraud” is less.

References

Alarfaj, F.K., Malik, I., Khan, H.U., Almusallam, N., Ramzan, M. and Ahmed, M., 2022. Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access, 10, pp.39700-39715.

Alghofaili, Y., Albattah, A. and Rassam, M.A., 2020. A financial fraud detection model based on LSTM deep learning technique. Journal of Applied Security Research, 15(4), pp.498-516.

Dzakiyullah, N.R., Pramuntadi, A. and Fauziyyah, A.K., 2021. Semi-supervised classification on credit card fraud detection using autoencoders. Journal of Applied Data Sciences, 2(1), pp.01-07.

Faraji, Z., 2022. A review of machine learning applications for credit card fraud detection with a case study. SEISENSE Journal of Management, 5(1), pp.49-59.

Chang, V., Di Stefano, A., Sun, Z. and Fortino, G., 2022. Digital payment fraud detection methods in digital ages and Industry 4.0. Computers and Electrical Engineering, 100, p.107734.

Gupta, P., Varshney, A., Khan, M.R., Ahmed, R., Shuaib, M. and Alam, S., 2023. Unbalanced Credit Card Fraud Detection Data: A Machine Learning-Oriented Comparative Study of Balancing Techniques. Procedia Computer Science, 218, pp.2575-2584.

Wu, T.Y. and Wang, Y.T., 2021, November. Locally interpretable one-class anomaly detection for credit card fraud detection. In 2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) (pp. 25-30). IEEE.

Sadgali, I., Sael, N. and Benabbou, F., 2019, October. Fraud detection in credit card transaction using neural networks. In Proceedings of the 4th international conference on smart city applications (pp. 1-4).

Singh, A. and Jain, A., 2019. An Empirical Study of AML Approach for Credit Card Fraud Detection—Financial Transactions. International Journal of Computers Communications & Control, 14(6), pp.670-690.

Plakandaras, V., Gogas, P., Papadimitriou, T. and Tsamardinos, I., 2022. Credit card fraud detection with automated machine learning systems. Applied Artificial Intelligence, 36(1), p.2086354.

Al Rubaie, E.M.H., 2021. Improvement in credit card fraud detection using ensemble classification technique and user data. International Journal of Nonlinear Analysis and Applications, 12(2), pp.1255-1265.

Ji, Y., 2021. Explainable AI methods for credit card fraud detection: Evaluation of LIME and SHAP through a User Study.

Vivek, B., Nandhan, S.H., Zean, J.R., Lakshmi, D. and Dhanwanth, B., 2023. Applying Machine Learning to the Detection of Credit Card Fraud. International Journal of Intelligent Systems and Applications in Engineering, 11(3), pp.643-652.

Ojugo, A.A. and Nwankwo, O., 2021. Spectral-cluster solution for credit-card fraud detection using a genetic algorithm trained modular deep learning neural network. JINAV: Journal of Information and Visualization, 2(1), pp.15-24.

Mijwil, M.M. and Salem, I.E., 2020. Credit card fraud detection in payment using machine learning classifiers. Asian Journal of Computer and Information Systems (ISSN: 2321–5658), 8(4).

Karthik, V.S.S., Mishra, A. and Reddy, U.S., 2022. Credit card fraud detection by modelling behaviour pattern using hybrid ensemble model. Arabian Journal for Science and Engineering, pp.1-11.

Know more about UniqueSubmission’s other writing services:

Assignment Writing Help

Essay Writing Help

Dissertation Writing Help

Case Studies Writing Help

MYOB Perdisco Assignment Help

Presentation Assignment Help

Proofreading & Editing Help