Modelling loan interest rates

Application exercise

In this application exercise we will be studying loan interest rates. The dataset is one you’ve come across before in your reading – the dataset about loans from the peer-to-peer lender, Lending Club, from the openintro package. We will use tidyverse and tidymodels for data exploration and modeling, respectively.

library(tidyverse)
library(tidymodels)
library(openintro)

Before we use the dataset, we’ll make a few transformations to it.

loans <- loans_full_schema %>%
  mutate(
    credit_util = total_credit_utilized / total_credit_limit,
    bankruptcy  = as.factor(if_else(public_record_bankrupt == 0, 0, 1)),
    verified_income = droplevels(verified_income),
    homeownership = str_to_title(homeownership),
    homeownership = fct_relevel(homeownership, "Rent", "Mortgage", "Own")
    ) %>%
  rename(credit_checks = inquiries_last_12m) %>%
  select(interest_rate, verified_income, debt_to_income, credit_util, bankruptcy, term, credit_checks, issue_month, homeownership)

Here is a glimpse at the data:

glimpse(penguins)

Rows: 344
Columns: 7
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…

Interest rate vs. credit utilization ratio

The regression model for interest rate vs. credit utilization is as follows.

rate_util_fit <- linear_reg() |>
  fit(interest_rate ~ credit_util, data = loans)

tidy(rate_util_fit)

# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)    10.5     0.0871     121.  0        
2 credit_util     4.73    0.180       26.3 1.18e-147

And here is the model visualized:

ggplot(loans, aes(x = credit_util, y = interest_rate)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm")

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

Warning: Removed 2 rows containing missing values (`geom_point()`).

Your turn: What is the estimated interest rate for a loan applicant with credit utilization of 0.8, i.e. someone whose total credit balance is 80% of their total available credit?

# add code here

Interest rate vs. homeownership

Next we predict interest rates from homeownership, which is a categorical predictor with three levels:

levels(loans$homeownership)

[1] "Rent"     "Mortgage" "Own"

Demo: Fit the linear regression model to predict interest rate from homeownership and display a tidy summary of the model. Write the estimated model output below.

# add code here

Your turn: Interpret each coefficient in context of the problem.
- Intercept: Add response here.
- Slopes: Add response here.

Interest rate vs. credit utilization and homeownership

Main effects model

Demo: Fit a model to predict interest rate from credit utilization and homeownership, without an interaction effect between the two predictors. Display the summary output and write out the estimated regression equation.

# add code here

\[ add~math~text~here \]

Demo: Write the estimated regression equation for loan applications from each of the homeownership groups separately.
- Rent: \(add~math~text~here\)
- Mortgage: \(add~math~text~here\)
- Own: \(add~math~text~here\)
Question: How does the model predict the interest rate to vary as credit utilization varies for loan applicants with different homeownership status. Are the rates the same or different?

Add response here.

Interaction effects model

Demo: Fit a model to predict interest rate from credit utilization and homeownership, with an interaction effect between the two predictors. Display the summary output and write out the estimated regression equation.

# add code here

\[ add~math~text~here \]

Demo: Write the estimated regression equation for loan applications from each of the homeownership groups separately.
- Rent: \(add~math~text~here\)
- Mortgage: \(add~math~text~here\)
- Own: \(add~math~text~here\)
Question: How does the model predict the interest rate to vary as credit utilization varies for loan applicants with different homeownership status. Are the rates the same or different?

Add response here.

Choosing a model

Rule of thumb: Occam’s Razor - Don’t overcomplicate the situation! We prefer the simplest best model.

# add code here

Review: What is R-squared? What is adjusted R-squared?

Add response here.

Question: Based on the adjusted \(R^2\)s of these two models, which one do we prefer?

Add response here.

Another model to consider

Your turn: Let’s add one more model to the variable – issue month. Should we add this variable to the interaction effects model from earlier?

# add code here

Add response here.