1.0 Introduction 

1.1 Current Business Environment

Technology improvements, international rivalry, and changing customer behaviour have all contributed to the present business environment’s high level of uncertainty and quick change. To survive and succeed in this changing environment, businesses must innovate and adapt swiftly (Son et al. 2019).

1.2 Problems and Challenges

The loan data case includes several issues and difficulties, such as data quality problems like missing and duplicate values, difficulty in analysing some subgroups due to small sample sizes, low approval rates for some subgroups, like female applicants and graduates, and the need to identify factors that contribute to loan approval in order to develop strategies to increase approval rates (Konstantopoulos et al. 2020).

1.3 Impact and Benefits of a Programming Solution

The effectiveness and precision of the loan application process can be greatly impacted by the implementation of a programming solution. It can eliminate errors and enhance decision-making by automating data entry and analysis. As a result, the lending institution may see quicker loan processing times, higher client satisfaction levels, and improved risk management. Data analytics can also be used to spot patterns and trends, which improves the targeting of loan products and marketing campaigns.

1.4 Implications of Inaction

Get Assignment Help from Industry Expert Writers (1)

There could be a number of negative effects if the lending firm decides not to take action. First, by continuing to ignore competent applicants who are turned down because of the defective system in place, it may continue to lose out on potential revenue. If rumours about the company’s unfair and discriminatory loan approval procedure emerge, the company’s reputation may also suffer. This can result in a decline in clientele and harm to the reputation of the business. In the event that anti-discrimination legislation are determined to have been broken, the business may also be subject to legal action (Allan et al. 2019).

1.5 Human Resources Requirements

Additional personnel with skills in software development, data analysis, and machine learning may be needed in order to implement a programming solution for loan approval. The company may need to hire or train personnel to oversee the new system, protect data security and privacy, and resolve any technical concerns. To guarantee that personnel can use the new system properly, the organisation may also need to offer continuing training and support.

2.0 Approach 

2.1 Choice of Language, Software, and Tools

Given its adaptability, clarity, and wealth of modules that facilitate data analysis and visualisation, Python was selected as the programming language for this project. The three main libraries that would be used are Pandas, NumPy, and Matplotlib; Pandas would be used for manipulating data, NumPy for performing mathematical operations, and Matplotlib for visualising results (Adekola et al. 2023).

The development environment would be Jupyter Notebook since it enables simple sharing and collaboration while combining code, text, and visualisation in a single document. The project’s data preparation would start with a check for missing values and duplicates, then either remove or re-impute them as appropriate. Additionally, any required data cleaning, such as data type conversions and categorical variable encoding, would be carried out (Aguiar et al. 2020).

Matplotlib would be used for visualisation to produce charts and graphs that offer insights into the data and help identify patterns, trends, and outliers in the data. In order to save time and lower the possibility of mistakes, code libraries are a necessity for this project. They offer pre-built functions and techniques. It’s crucial to follow the proper procedures for using these libraries, and any created code should be properly planned out and tested to prevent introducing problems. For the solution to be accurate, effective, and maintainable, proper design and code testing are essential.

2.2 Data Preparation Steps

The first step in preparing the data for this project would be to look for duplicates and missing values in the dataset. Python’s Pandas package can be used to accomplish this. It would be necessary to delete or impute any missing values or duplicates (Frazzetto et al. 2019).

Get Assignment Help from Industry Expert Writers (1)

The following action would be to carry out any required data cleaning, including data type conversions and categorical variable encoding. For instance, categorical variables may need to be one-hot encoded or label encoded if they are included in the dataset. It would be necessary to divide the dataset into training and testing sets after data cleaning. This is crucial to avoid overfitting and to assess the effectiveness of the machine learning model.

 

Making sure the dataset is suitably sized is crucial before using machine learning techniques. The Scikit-learn library contains methods like StandardScaler and MinMaxScaler that can be used for this (Kim et al. 2019). The final step is to train and evaluate machine learning models using the prepared data. The problem to be solved and the data to be used will determine which model is selected. In order to get insights and make wise judgements, the model’s performance would then be assessed and the results would be interpreted.

2.3 Role of Code Libraries

It  offer pre-built functions and techniques that might decrease time and the possibility of errors, code libraries are essential to data analysis and programming solutions. In the context of this project, libraries like Pandas, NumPy, and Matplotlib are crucial tools that offer a wide range of functionality that can greatly simplify data processing, numerical computation, and visualisation (Wan et al. 2020).

While NumPy offers quick and effective numerical operations on arrays, Pandas is a strong package that offers data structures for effectively storing and processing massive datasets. Users of the data visualisation library Matplotlib have access to a variety of chart and graph choices. By using code libraries, developers can spend more time on higher-level activities and problem-solving and less time on developing low-level code. However, it’s crucial to make sure libraries are utilised appropriately and in a way that is in line with the general project needs. To make sure the solution is efficient and successful, it’s also critical to keep up-to-date on the libraries being utilised and any potential constraints (Han et al. 2022).

2.4 Design and Testing Considerations

By using code libraries, developers can spend more time on higher-level activities and problem-solving and less time on developing low-level code., it’s essential to make sure libraries are utilised appropriately and in a way that is in line with the general project needs. To make sure the solution is efficient and successful, it’s also critical to keep up-to-date on the libraries being utilised and any potential constraints. To minimise the likelihood of errors, make the code manageable, and make it scalable, it is crucial to make sure the code is well-designed and tested (Petrucci et al. 2020).

2.5 Use of Data Visualization

Data visualisation is essential in the provided code for understanding the data. Different sorts of charts and graphs are made using the seaborn and matplotlib libraries to aid in spotting trends, patterns, and outliers in the data. For instance, a scatter plot may be used to visualise the frequency distribution of categorical variables whereas a bar chart can be used to show the relationship between two variables. Visualising the correlation between various variables is made easier by the pair plot and heatmap. The analyst can give stakeholders more logical and understandable explanations of the insights obtained from the data by employing data visualisation. This can help in both identifying potential problems or topics for additional inquiry as well as informing decisions based on facts (Lu et al. 2021).

3.0 Descriptive Analysis 

3.1 Loan Data Validation and Indexing

The validation of the dataset and accurate indexing are the initial steps in the descriptive analysis of loan data. In order to retrieve and analyze data effectively, it is crucial to index the loan data on the Loan_ID column. The efficiency of searches and operations on the dataset is enhanced by indexing, which makes it possible to quickly retrieve certain rows based on their Loan_ID values.

There are numerous checks that may be made to verify the loan data. Checking for missing values, looking for duplicate entries, confirming the data types of each column, and assuring the consistency and integrity of the data are a few of these (Gómez et al. 2019).

3.2 Data Cleaning Steps

Data cleaning is a crucial step in data preparation for analysis. It involves handling missing values, correcting duplicates, dealing with outliers, and transforming variables. Here are the steps involved in data cleaning:

 

Missing Values: Identify columns with missing values and decide how to handle them. Options include removing rows with missing values, imputing values based on statistical measures, or using advanced techniques like machine learning algorithms for imputation.

Duplicates: Check for duplicate records based on unique identifiers. Remove or merge duplicate rows to ensure data integrity and avoid skewing analysis results.

Outliers: Recognise outliers, which are extreme numbers that dramatically differ from the rest of the data, and deal with them. Outliers can be dealt with in several ways, including removal, data transformation, and the use of strong statistical methods that are less sensitive to outliers.

Variable Transformation: If required, transform variables by scaling numerical variables, encoding category variables, or producing derived variables depending on particular computations or corporate policies.

 

The ratio of female applicants whose loan status was approved to the total number of female applicants may be used to calculate the proportion of female applicants that were approved in the loan dataset. This sheds light on how often loans are approved for female applicants. We can find any gender-based differences in loan acceptance rates by looking at this proportion (Ferrone et al. 2022).

3.3.2 Average Income of All Applicants

We may compute the mean of the “ApplicantIncome” column in the loan dataset to get the average income of all applicants. This gives a broad indication of the median income level of all loan applicants. It aids in figuring out the applicants’ overall financial capability and analyzing the distribution of their revenue.

3.3.3 Average Income of Self-Employed Applicants

We may filter the loan dataset based on the “Self_Employed” column to only include the records where the applicant is self-employed in order to compute the average salary of self-employed applicants. The mean of the “ApplicantIncome” column for these filtered data may then be calculated. This reveals information on the typical income among self-employed candidates in particular, assisting in the discovery of any variations in income levels when co

3.3.4 Average Income of Non-Self-Employed Applicants

We filter the loan dataset based on the “Self_Employed” column, choosing only records where the applicant is not self-employed, in order to calculate the median salary of non-self-employed applicants.

 

3.3.5 Average Income of Graduate Applicants

To calculate the average income of graduate applicants, we filter the loan dataset based on the “Graduate” column, selecting only records where the applicant’s education status is “Graduate.” Then, we compute the mean of the “ApplicantIncome” column for these filtered records.

3.3.6 Percentage of Approved Graduate Applicants

We compute the ratio of accepted loan applications to the total number of loan applications for graduate applicants to arrive at the percentage of approved graduate applicants. We can evaluate the success rate of loan applications among people with graduate degrees thanks to this analysis, which sheds light on the loan approval rates, especially for such people.

4.0 Recommendations for Future Work

4.1 Proposed Route Forward

 

4.3 Techniques, Libraries, Tools, and Objective Functions for Model Evaluation

A critical stage in determining the efficiency and efficacy of predictive models is model assessment. To analyze models, a variety of methodologies, libraries, tools, and objective functions are available. Let’s talk about them quickly:

Techniques:

Confusion Matrix: It displays true positives, true negatives, false positives, and false negatives as a tabular comparison between model predictions and actual values (Vijesh Joe et al. 2021).

Cross-validation: This process includes breaking the dataset up into several subsets, training the model on some of those subsets, and then testing it on other subsets to determine how well it generalizes.

ROC Curve: To assess the effectiveness of a model, the Receiver Operating Characteristic (ROC) curve shows the true positive rate versus the false positive rate at various categorization levels (Musmade et al. 2022).

Libraries:

Scikit-learn: Scikit-learn is a well-known Python toolkit for machine learning that offers a variety of metrics, cross-validation methods, and tools for model assessment.

TensorFlow/Keras: These libraries include metrics like accuracy, precision, recall, F1 score, and others for assessing neural network models.

PyTorch: It offers tools for evaluating models, such as calculating accuracy, creating confusion matrices, and analyzing model prediction.

Tools:

Jupyter Notebook:  Jupyter Notebook is a great tool for documenting, visualizing, and assessing models since it allows interactive coding.

Libraries for Model assessment: Particular libraries, such as Yellowbrick and MLflow, provide functions and visualizations specifically for model assessment.

 

Goal-oriented purposes:

Accuracy: Measures how accurately the model’s predictions were made overall.

Precision: Represents the percentage of accurate positive forecasts among all positive predictions.

Recall: Calculates the percentage of accurate positive predictions among the occurrences of positive outcomes.

F1 Score: A balanced assessment metric for unbalanced datasets, calculated as the harmonic mean of accuracy and recall.

Mean Squared Error (MSE): The average squared difference between the anticipated and actual values is calculated using the Mean Squared Error (MSE), which is frequently used for regression tasks.

The unique issue, dataset, and modeling strategy all influence the choice of methods, libraries, tools, and objective functions for model assessment. To properly evaluate the model’s performance, it is crucial to take into account the features of the data and the assessment criteria that are in line with the project’s goals.

5.0 Conclusions 

Understanding the variables influencing a crowdfunding campaign’s success was made possible through the analysis and visualisation of the provided dataset. Python was used for the project’s data manipulation and visualisation, together with tools like Pandas, NumPy, and Matplotlib. The environment for development was Jupyter Notebook.

Imputing missing data, locating duplicates, and encoding categorical variables were all steps in the data preparation process. The visuals demonstrated the correlation between the success of a campaign and variables like its duration, financial target, and number of backers. By using visualisations, the data could be understood clearly, and trends and outliers could be found.

It offered pre-built functions and methods, code libraries played a vital part in the project by reducing errors and saving time. But it was crucial to make sure that any created code was well-designed and tested and that these libraries were utilised correctly.The value of data analysis and visualisation in decision-making was overall illustrated by this study. The findings can guide investors and business owners in making decisions by enabling them to better understand the elements that contribute to the success of crowdfunding campaigns. The project also made clear the value of careful code testing and design to guarantee the accuracy, effectiveness, and maintainability of the final product.

 

References 

Son, S.U., Seo, S.B., Jang, S., Choi, J., Lim, J.W., Lee, D.K., Kim, H., Seo, S., Kang, T., Jung, J. and Lim, E.K., 2019. Naked-eye detection of pandemic influenza a (pH1N1) virus by polydiacetylene (PDA)-based paper sensor as a point-of-care diagnostic platform. Sensors and Actuators B: Chemical, 291, pp.257-265.

Adekola, O., Lamond, J., Adelekan, I., Bhattacharya‐Mis, N., Ekinya, M., Bassey Eze, E. and Ujoh, F., 2023. Towards adoption of mobile data collection for effective adaptation and climate risk management in Africa. Geoscience Data Journal, 10(2), pp.276-290.

Vijesh Joe, C., Raj, J.S. and Smys, S., 2021. Big Data Analytics: Tools, Challenges, and Scope in Data-Driven Computing. In International Conference on Mobile Computing and Sustainable Informatics: ICMCSI 2020 (pp. 709-719). Springer International Publishing.

Allan, E.J. and Tolbert, A.R., 2019. Advancing social justice with policy discourse analysis. Research methods for social justice and equity in education, pp.137-149.

Lu, J., 2020. Data analytics research-informed teaching in a digital technologies curriculum. INFORMS Transactions on Education, 20(2), pp.57-72.

Kim, C. and Lee, K., 2019. Polydiacetylene (PDA) liposome-based immunosensor for the detection of exosomes. Biomacromolecules, 20(9), pp.3392-3398.

Ferrone, V., Bruni, P., Canale, V., Sbrascini, L., Nobili, F., Carlucci, G. and Ferrari, S., 2022. Simple Synthesis of Fe3O4@-Activated Carbon from Wastepaper for Dispersive Magnetic Solid-Phase Extraction of Non-Steroidal Anti-Inflammatory Drugs and Their UHPLC–PDA Determination in Human Plasma. Fibers, 10(7), p.58.

Frazzetto, D., Nielsen, T.D., Pedersen, T.B. and Šikšnys, L., 2019. Prescriptive analytics: a survey of emerging trends and technologies. The VLDB Journal, 28, pp.575-595.

Barkho, L., 2023. For a postfoundational method to news discourse analysis. Cogent Arts & Humanities, 10(1), p.2185446.

Aguiar, J., Gonçalves, J.L., Alves, V.L. and Câmara, J.S., 2020. Chemical fingerprint of free polyphenols and antioxidant activity in dietary fruits and vegetables using a non-targeted approach based on QuEChERS ultrasound-assisted extraction combined with UHPLC-PDA. Antioxidants, 9(4), p.305.

Han, Y., Liu, X., Zhao, Q., Gao, Y., Zhou, D., Long, W., Wang, Y. and Song, Y., 2022. Aptazyme-induced cascade amplification integrated with a volumetric bar-chart chip for highly sensitive detection of aflatoxin B1 and adenosine triphosphate. Analyst, 147(11), pp.2500-2507.

Petrucci, R., Di Matteo, P., De Francesco, G., Mattiello, L., Perretti, G. and Russo, P., 2020. Novel fast identification and determination of free polyphenols in untreated craft beers by HPLC-PDA-ESI-MS/MS in SIR mode. Journal of Agricultural and Food Chemistry, 68(30), pp.7984-7994.

Lu, W., Fu, S., Sun, X., Liu, J., Zhu, D., Li, J. and Chen, L., 2021. Magnetic solid-phase extraction using polydopamine-coated magnetic multiwalled carbon nanotube composites coupled with high performance liquid chromatography for the determination of chlorophenols. Analyst, 146(20), pp.6252-6261.

Musmade, B.D., Sawant, A.V., Kulkarni, S.V., Nage, S.D., Bhope, S.G., Padmanabhan, S. and Lohar, K.S., 2022. Method Development, Validation and Estimation of Relative Response Factor for the Quantitation of Known Impurities in Mometasone Furoate Nasal Spray Dosage form by RP-HPLC with UV/PDA Detector. Pharmaceutical Chemistry Journal, 56(4), pp.538-544.

Konstantopoulos, G., Koumoulos, E.P. and Charitidis, C.A., 2020. Testing novel portland cement formulations with carbon nanotubes and intrinsic properties revelation: Nanoindentation analysis with machine learning on microstructure identification. Nanomaterials, 10(4), p.645.

Hart, P., 2021. A critical analysis of different methods used to elicit project requirements.

Gómez, J., Simirgiotis, M.J., Lima, B., Paredes, J.D., Villegas Gabutti, C.M., Gamarra-Luques, C., Bórquez, J., Luna, L., Wendel, G.H., Maria, A.O. and Feresin, G.E., 2019. Antioxidant, gastroprotective, cytotoxic activities and UHPLC PDA-Q orbitrap mass spectrometry identification of metabolites in Baccharis grisebachii decoction. Molecules, 24(6), p.1085.

Saenjaiban, A., Singtisan, T., Suppakul, P., Jantanasakulwong, K., Punyodom, W. and Rachtanapun, P., 2020. Novel color change film as a time–temperature indicator using polydiacetylene/silver nanoparticles embedded in carboxymethyl cellulose. Polymers, 12(10), p.2306.

(Saenjaiban et al. 2020)

Chen, W., Bartlett, T. and Peng, H., 2021. The erasure of nature in the discourse of oil production: An enhanced eco-discourse analysis, Part 1. Pragmatics and Society, 12(1), pp.6-32.

Karimuzzaman, M., Islam, N., Afroz, S. and Hossain, M.M., 2021. Predicting stock market price of Bangladesh: a comparative study of linear classification models. Annals of Data Science, 8, pp.21-38.

(Chen et al. 2021)

Wan, M., Zhao, H., Wang, Z., Zhao, Y. and Sun, L., 2020. Preparation of Ag@ PDA@ SiO2 electrospinning nanofibrous membranes for direct bacteria SERS detection and antimicrobial activities. Materials Research Express, 7(9), p.095012.

Schäffer, E., Stiehl, V., Schwab, P.K., Mayr, A., Lierhammer, J. and Franke, J., 2021. Process-driven approach within the engineering domain by combining business process model and notation (BPMN) with process engines. Procedia CIRP, 96, pp.207-212.

Know more about UniqueSubmission’s other writing services:

Assignment Writing Help

Essay Writing Help

Dissertation Writing Help

Case Studies Writing Help

MYOB Perdisco Assignment Help

Presentation Assignment Help

Proofreading & Editing Help

 

 

1 Comment

Leave a Comment