As an investor, it pays off to understand who prepays on personal loans. Prepayment is the early repayment of a loan by a borrower, often as the result of optional refinancing to take advantage of lower interest rates. Borrower prepayment means forgone interest income, and many peer-to-peer lending platforms don’t charge a prepayment penalty. Thus, the construction of an optimal portfolio must examine prepayment risk. How can we predict who will prepay on their loans?

One answer is logistic regression, which measures a probability of how likely a certain borrower is to prepay. But in the world of credit risk modeling, this analysis is missing a crucial component – time. All else equal, a borrower who prepays a month before term is preferable to a borrower who prepays 6 months before term. So, what can we use to predict not only whether, but *when*, a loan will prepay?

The answer is survival analysis, which is the analysis of *time *until an event (in our case, prepayment). Originally, this analysis was concerned with time from treatment until death, but has now found a plethora of applications in (arguably) cheerier areas. Survival analysis allows us to estimate time-to-prepayment for a group of individuals, compare time-to-prepayment between two or more groups, and assess the relationship of different variables to time-to-prepayment. And importantly, survival analysis allows us to work with *censored* data, where the study ends before the event of interest (e.g. prepayment) can be observed. In the context of consumer credit, a censored observation would be a customer who is still making payments on their loan, so that the event of interest (e.g. prepayment) is not yet observed.

In this post, I will use descriptive survival analysis to explore **characteristics of borrowers who tended to prepay their loans**.

The richness of Lending Club’s publicly available data is what allows us to measure prepayment, and sets the peer to peer lending industry distinctly apart from traditional finance. As Lending Club CEO Renaud Laplanche said to Fortune in March 2014, *“We want to transform the banking system into a marketplace that is more competitive, more consumer-friendly, more transparent.”* It is an exciting time to be involved with peer to peer lending.

With this data, we are able to explore term 36 loans issued between January 1, 2012 and December 1, 2014. For our purposes, we exclude borrowers who were delinquent or defaulted, leaving only the borrowers who either paid off or are currently active. ⅔ of this population is still active.

The binary event of interest here is prepayment. There are 4 possible scenarios:

- A loan is fully paid off on time
- A loan defaults, or is delinquent
- A loan is fully paid off, but late
- A loan is fully paid off early (our
**event**) - A loan is still active (
**censored**)

We are estimating the probability that a borrower will survive (doesn’t prepay) up to a certain time. This probability, the heart of survival analysis, is called the *survivor function*. If every borrower was followed until maturity, the survival curve could be estimated quite painlessly by computing the fraction surviving at each time. However, ⅔ of the dataset consists of loans that are still active. We label these situations as *censored* observations, and the simplest way to compute the survival times of these loans is by using the Kaplan-Meier product-limit estimator, a nonparametric statistic that estimates survival over time, even in the presence of censored observations.

The Kaplan-Meier product-limit estimator is defined thus:

where:

- S(t
_{i}) is the estimated probability that a loan from a given population will have a lifetime exceeding time*t*_{i} -
*n*_{i}*t*_{i} *d*is the number of subjects who die during time period_{i}*t*_{i}

This is the empirical probability of surviving past certain times in the sample, where only the surviving cases that are still being observed (have not yet been censored) are “at risk” of an (observed) death.

Let’s first take a look at the Kaplan-Meier Curves of all possible scenarios:

Those who paid off late, paid off on time, or are still active, all “survived” to month 36, and so had a survival probability of 1, which quickly fell to 0 once their loan hit past scheduled maturity. As a reminder, this subset excludes those who defaulted and never paid back.

Not surprisingly, the prepaid population “died off” (i.e. prepaid) much faster than the total population, with 50% of those who prepaid doing so by month 14.

Finally, when we look at the total population, we see that 25% of the total population (the purple line) prepaid by month 20. Say you’re an investor who financed a $10,000 loan at a 25% interest rate. If the loan prepaid by month 20, then you will have missed out on nearly $1,000 in interest payments – almost 10% of forgone interest.

This is a bird’s eye view of the overall population. Now let’s take different features into consideration, and see if they illuminate different prepayment trends in the population. For example, we can stratify the population by employment length:

Unemployed people prepaid much later than the rest of the population. If a borrower wasn’t employed at the time of the loan, then it makes sense that they wouldn’t have an active income to afford paying off their loan, which would lead to later prepayment.

Did those with higher income prepay sooner?

Yes, the higher the income, the sooner a borrower prepaid. Though not a definitive conclusion, it stands to reason that those with less income at their disposal would be less likely to prepay earlier. It is also worthwhile to note that Lending Club grants very few loans to people who make less than $25,000.

A final interesting variable to consider would be the monthly debt to monthly income ratio:

Here we see a fairly clear trend: those with lower debt to income ratios prepaid more frequently early than late, probably because they had more income left over after debt payments to be able to pay off their loans. This is evidence for further research.

We started off answering the question: “Who is the type of borrower that prepays?” Well, we’ve shown that people who who are more likely to prepay:

- are employed
- have higher income
- have lower debt

Basically, the type of borrower who prepays is the one who can. It follows rational intuition that those with a higher ability to prepay are more likely to.

## Comments

This in an interesting analysis, which supports the intuition that a borrower who has the means to prepay the loan will. What I would find interesting is to know whether borrowers with higher interest rate loans have a higher tendency to prepay than borrowers with less expensive loans. Going by the above analysis one would have to assume that it is not the case as borrowers who hold more expensive loans are the ones who are less employed, have less income, and higher DTIs. I would make the case, however, that there is a sub-segment of borrowers with higher interest rate loans who have a higher tendency to prepay than borrowers with less expensive loans. These are the cherries within the lower rated populations. In other words borrowers who have a good credit file but their FICO score is low due to some derogatory credit event that occurred in the past. Consolidating their debt with a high interest rate may improve the FICO score and qualify the borrower for a lower interest rate loan arguably from a competitor, which if my assumptions are correct should be seen in higher prepayment rates for a segment of the high interest rate loan population.