statistics in business
Statistics – Banking Project
Use the data in the google drive based on loan default.zip file for loan default project and present the findings in the following manner in word file and also in a PPT format.
Part-1
1) Understand and define the problem statement.
2) Get a preliminary understanding of data and perform exploratory data analysis.
&
1) Discuss the business context.
2) Data cleaning and pre – processing (like outlier treatment, missing value treatment etc.)
3) How to generate insights from EDA?
4) Discuss about any finer nuances that could be used to generate insights.
Part-2
1) Business Problem Understanding and Problem definition
2) Generate a data report.
3) Exploratory Data analysis and insights driven from it.
Part-3
1) Build various models and check their accuracy.
2) Discuss about model validation
3) Discuss about model tuning
4) Discuss about how to draw business insights & recommendations.
Part-4
1) Model Building and comparison
2) Model Tuning
3) Model Interpretation
Part-5
1) Business insights and recommendations
2) A structure for presentation.
Submit head wise all the above in a word document and make a PPT also which helps explain & communicate the data analysis to be used in the final presentation.
>Sheet or 60.
where 0 means less than one year and 10 means ten or more years.
+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years
Many banks believed lending to individuals is the risk-free given thy are better placed with credit scores and sometimes the loans are backed by collateral. But recently the banking system has witnessed an increase in the loan default i.e. the borrower is not able to pay back the instalment on time. These loan defaults directly impact the revenues of a banking system. Now a days, banks are scrutinizing each loan application to identify potential loan default cases so that they can predict which client is going to default the loan repayment and at which step. Based upon the given data from a bank, build a model to predict default loan that will help the bank to take required actions. The overall project working is fine. In addition to the above can he make a sub-note with the following specific points as a separate assignment for the same data in 7 pages. Please charge me the fairest for this Requirement of 7 pages notes is as under in detail: 1) Introduction a) Defining problem statement b) Need of the study/project c) Understanding business/social opportunity 2)Data Report a) Understanding how data was collected in terms of time, frequency and methodology b) Visual inspection of data (rows, columns, descriptive details) c) Understanding of attributes (variable info, renaming if required) 3) Exploratory data analysis a) Univariate analysis (distribution and spread for every continuous attribute, distribution of data in categories for categorical ones) b) Bivariate analysis (relationship between different variables , correlations) a) Removal of unwanted variables b) Missing Value treatment d) Outlier treatment e) Variable transformation (if applicable) f) Addition of new variables 4) Insights from EDA a) Is the data unbalanced ? If so, what can be done ? b) Any insights using clustering (if applicable) c) Any other Insights1
#
Fields
Description
1
member_id
A unique Id for the borrower member.
2
loan_amnt
The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
3
funded_amnt
The total amount committed to that loan at that point in time.
4
funded_amnt_inv
The total amount committed by investors for that loan at that point in time.
5
term
The number of payments on the loan. Values are in months and can be either 3
6
6
int_rate
Interest Rate on the loan
7
installment
The monthly payment owed by the borrower if the loan originates.
8
grade
Assigned loan grade
9
emp_length
Employment length in years. Possible values are between 0 and
10
10
home_ownership
The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
11
annual_inc
The self-reported annual income provided by the borrower during registration.
12
verification_status
Status of the verification done
13
issue_d
The month which the loan was funded
14
pymnt_plan
Indicates if a payment plan has been put in place for the loan
15
desc
Loan description provided by the borrower
16
purpose
A category provided by the borrower for the loan request.
17
addr_state
The state provided by the borrower in the loan application
18
dti
A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
19
delinq_2yrs
The number of
30
20
earliest_cr_line
The month the borrower’s earliest reported credit line was opened
21
inq_last_6mths
The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
22
mths_since_last_delinq
The number of months since the borrower’s last delinquency.
23
open_acc
The number of open credit lines in the borrower’s credit file.
24
revol_bal
Total credit revolving balance
25
revol_util
Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
26
total_acc
The total number of credit lines currently in the borrower’s credit file
27
out_prncp
Remaining outstanding principal for total amount funded
28
out_prncp_inv
Remaining outstanding principal for portion of total amount funded by investors
29
total_pymnt
Payments received to date for total amount funded
30
total_pymnt_inv
Payments received to date for portion of total amount funded by investors
31
total_rec_prncp
Principal received to date
32
total_rec_int
Interest received to date
33
total_rec_late_fee
Late fees received to date
34
recoveries
post charge off gross recovery
35
collection_recovery_fee
post charge off collection fee
36
last_pymnt_d
Last month payment was received
37
last_pymnt_amnt
Last total payment amount received
38
next_pymnt_d
Next scheduled payment date
39
last_credit_pull_d
The most recent month pulled credit for this loan
40
application_type
Indicates whether the loan is an individual application or a joint application with two co-borrowers
41
loan_status
Current status of the loan