P2P LENDING INTEREST RATES SENSITIVITY TO SOCIOECONOMIC FACTORS
IVAN CHEMYAKIN,
LOMONOSOV MOSCOW STATE UNIVERSITY, FACULTY OF ECONOMICS
Abstract. The present paper analyses how sensitive P2P credit interest rates are to the socioeconomic data declared by P2P credit users. The reasons why credit users are choosing P2P platforms as an alternative to commercial banks include: lower interest rate, shorter loan processing times, fewer document requirements and the online nature of the process. Of these, the most winning feature for the borrower is the lower interest rate. According to the study, in 94% of cases, a onestep adjustment in a user’s credit rating triggers an interest rate change of 0.654%, while the length of their credit history and the purpose of their loan have no impact on the chosen interest rate.
Keywords: P2P lending, social lending, online lending, interest rate, socioeconomic factors.
JEL classification: D14, E34, G23
TCHEMYAKIN, IVAN (2015) "P2P LENDING INTEREST RATES SENSITIVITY TO SOCIOECONOMIC FACTORS". Journal of Russian Review (ISSN 23131578), VOL. 2(3), 111
1. Introduction
P2P lending (People to People, Internet users directly lending money to each other) is a rapidly growing sector of Internet finance, an industry first articulated by Yandiev (2015). By operating as a virtual banking system, the P2P lending market plays a key role in this industry. And, like in the banking sector, its most vital question is which factors influence the rate of interest on a loan.
Therefore, the aim of this study is to find out which factors, and to what extent, are shaping the interest rate in P2P lending.
2. Review of literature
The first ever online P2P platform, called ZOPA (zone of possible agreement), was launched in 2005 in Great Britain. The company may be said to have pioneered online P2P lending. Over the last five years the practice of online P2P lending has gained scientific validity. Open access to data allows researchers in various fields to study the factors influencing the process (Bachmann, Becker et al., 2011).
he number of studies of P2P lending is growing year by year. A review by Bechmann и Becker mentions 43 papers published from 2006 to late 2010 (Bachmann, Becker et al., 2011). Yang Yang claims that 2008 to 2015 saw 70 of such papers published (Yang Yang, 2015).
One of the earlier studies of P2P lending was the paper titled Internet Based Social Lending: Past, Present and Future (Hulme and Wright, 2006). One of most recent ones is Determinants of Default in P2P Lending (SerranoCinca, GutiérrezNietoand LópezPalacios, 2015). Overall, the scope of research in P2P lending seems to begetting more narrow.
A 2008 paper, titled Peer to Peer Banking – State of the Art (Arne Frerichs, Matthias Schumann, 2008) describes business models of companies currently active in the P2P lending market: Zopa, Prosper, Smava, and even the radically different Kiva. The study provides a description of the industry and lays down a number of avenues for further research. (Frerichs and Schumann, 2008). Moenninghoff and Wieandt go beyond P2P lending and deals with a whole range of opportunities for avoiding middlemen afforded by P2P platforms. Particular attention is given to the risks peertopeer network users take unto themselves by forgoing financial middlemen (Moenninghoff and Wieandt, 2011).
A number of researchers from different countries are turning their attention to subjects like P2P lending market regulation (Verstein, 2011; Chafee and Rapp, 2012; Zeng, 2013; and Slattery, 2013). Empirical studies of direct lending market actors deal, first and foremost, with the two major subjects: risk and returns.
Studies of the risk of investing in direct lending are most often based on a binary or multivariate econometric model (Freedman and Jin, 2008; Iyer, Khwaja et al., 2009; Мальцев, 2014). Here, binary choice models help evaluate each factor’s contribution to the risk total, with the latter value defined as the probability of defaulting on each individual debt.
The second avenue of empirical research in direct lending is taken by a number of publications featuring models for explaining, directly or indirectly, how an interest rate on a loan is shaped (Herzenstein, Andrews et al., 2008; Gonzalez and Loureiro, 2014; Wen and Wu, 2014; Zhang, Yang and Pan, 2014). Such models may take the shape of a regression equation (Herzenstein, Andrews et al., 2008), a gametheory model with a decision tree (Luo and Lin, 2013), and other forms. Of these, regressive equationbased models have a greater explanatory power because this type of model enables the researcher to perform factor analysis, interest rate projection, and evaluate projected forecast and overall model quality.
Reviewing prior empirical studies allows us to identify major significant factors and choose the most appropriate (in terms of significance and quality of fit) specifications for a model.
3. Econometric modeling
The present model for P2P lending interest rate was constructed using the loans data published online by LendingClub, the world’s largest online retail lending platform. It allows users who have provided information about themselves and the requested loan to post loan requests. All loans are unsecured, and may vary from $1 000 to $35 000.
Using the borrower’s credit rating, credit history, loan size, and a number of other factors, the platform sets the interest rate and the amount of other payments on a loan. The usual loan term is 3 years. 5year loans are provided at a premium and at a higher interest rate. A loan may be repaid at any time with no penalty. Interest rates vary from 6.03% to 26.06%.
LendingClub makes its profit by charging credit users for its assistance in getting the loan and credit lenders for using its services. The assistance fee varies with credit user’s rating from 1.1% to 5.0% of loan value. The service fee is 1% of all payments made by credit users.
4. Description of variables
With LendingClub providing the data on each provided loan – over 1 million loans annually – a year’s worth of data suffices to construct a reliable model.
A total of around 200 000 observations for 2014 was made. However, more than half of that data had to be rejected because of incomplete information submitted by users. In the end, the model was constructed using 80 000 observations, with 5 000 used to test its quality.
Of the 12 initial parameters, 15 regressors and 1 dependent variable were formed. The description of these is offered below.
1) Interest rate
Interest rates, vary from 6.03% to 26.06%. However, regardless of other factors, the rate of interest on a loan cannot be lower than the key interest rate. In case of the USA, the latter is low and extremely stable, therefore, for the US market this factor is of little significance; other countries, however, set their key interest rate higher and can change it several times a year, which automatically affects all other interest rates. Therefore it would be more logical to build a model not on the interest rate itself, but on the interest rate premium. The premium is calculated as the margin between loan interest rate and key interest rate. Within the model, premium is a dependent variable expressed in fractions.
2) Loan term
LendingClub assists users in getting loans for 3 and 5year terms. Since the variable in question can only take two discrete values, it can be classified as a dummy variable, where 3year loans are 0, and 5year loans are 1. Loan term is represented by a single dummy variable term. Given the fact that under LendingClub rules users requesting loans for a longer term (5 years) are charged higher interest rates, the correlation between the term and dependent variable can be considered a direct (positive) one.
3) Credit rating
LendingClub uses its own system of calculating individual credit ratings. It uses a number of factors to allocate users into grades (A to G) and subgrades (1 to 5), where A1 is the highest rating, and G5 is the lowest one. Associating the A1 rating with number 1, A2 with 2, and so forth creates, an ordinal variable grade that takes values from 1 to 35.
Higher values correspond to lower credit ratings and higher interest rates, meaning that premiums, too, are higher. That is, the correlation between grade and the dependent variable can be considered direct, or positive.
4) Form of residential property ownership
LendingClub recognises three forms of residential property: owned, mortgaged, and rented. In the majority of cases sampled it is the second form. This is due to the peculiarities of the US market, characterised by relatively easily available mortgages and socioeconomic stability. This parameter’s values were translated into dummy variables. Given that the parameter distinguishes between three separate groups, the regression has to have two dummy variables (home_own, home_rent) so as to follow the premise that no independent variable can be the linear combination of one or several other independent variables.
In case a borrower owns their residential property, home_own equals 1 and home_rent equals 0. In case they are renting an accommodation, home_own equals 0 and home_rent equals 1. In case they have a mortgage, both dummy variables (home_own and home_rent) equal 0.
5) Loan purpose
LendingClub users are seeking loans for all kinds of different purposes, like organising a wedding party, taking a trip, buying a car, covering the expense of moving house or buying a new one, restructuring a loan from a third party. However, in Russia the category is not so detailed. Therefore, 14 categories were translated into 6 dummy variables: auto loan financing (purpose_car), residential property financing (moving house, improving housing, buying a new house; purpose_home), debt consolidation (purpose_debt_consolidation), small business development (purpose_small_business), consumer credit (purpose_cosumer_credit), and other purposes (purpose_other). 6 discrete groups make 5 dummy variables (purpose_car, purpose_home, purpose_debt_consolidation, purpose_small_business, and purpose_cosumer_credit). In cases where a user is seeking a loan to buy a car the variable purpose_car equals 1, while purpose_home, purpose_debt_consolidation, purpose_small_business, and purpose_cnsumer_credit equal 0. The same logic goes for the other four groups. In cases where the individual does not fall into any one of the 5 groups (chose ‘Other’ as the purpose of the loan), all 5 dummy variables equal 0.
Most LendingClub users are seeking a loan for debt restructuring. This is due to the peculiarities of the US lending market, which enjoys a high degree of development and familiarity to most Americans, so much so that there, probably, is noone in the US who have never taken out a loan of some kind. Given that, on average, LendingClub interest rates are lower than those on the conventional market, the possibility of taking out a loan looks especially appealing, for example, to those who need to pay back a loan from a third party.
Diagram 1. Distribution of loan users by loan purpose
Source: compiled from data published on www.nsrplatform.com
6) Annual income
LendingClub users submit their annual income figures in tens of thousands of US dollars. However, not all of this income is confirmed. Average income statement of LendingClub users is $78 000 per annum.
Presumably, higher income figures translate into higher credit user reliability. However, it is not always the case. There are examples of extremely high earners prone to getting bankrupt, and extremely low earners responsible in meeting their contract obligations. Therefore it is not possible to unequivocally define the type of correlation between a borrower’s income and their credit reliability.
The present analysis uses this parameter as a divisor of the debt_to_income index (loan size to income ratio).
7) Loan size
Loans start at $1 and are capped at $35 000, with the average loan being $14 600. Presumably, loan size must be inversely related to interest rate. That is, bigger loans must have lower interest rates. Note that the present analysis uses this parameter as a divisor of the debt_to_income index.
Higher debt to annual income ratios translate into higher credit risks and, consequently, higher loan interest rates and higher interest premiums. That is, the correlation between a borrower’s debt to annual income ratio (debt_to_income) and the size of the premium on their loan is assumed to be positive.
8) Employment
LendingClub defines credit users’ employment in terms of the number of years they have been/were working for their current/last employer. Longer history of continuous employment is thought to mean higher borrower reliability because it implies the ability to maintain proper contract relationship. Therefore, the level of employment must be directly correlated with interest rate, and, by extension, interest premium.
One feature of LendingClub’s database is that terms of employment shorter than one year get noted as <1, terms over 10 years as 10+, and everything from 1 to 10 years as an integer. Therefore this parameter cannot be represented as a continuous variable and must be transformed into an ordinal one, where 1 is any term shorter than a year, 2 is any term 110 years long, and 3 is any term longer than 10 years. With regard to employment, most credit users fall into the second category.
9) Credit history length
Credit history length is the number of years since a user had their first line of credit opened. It should be noted, however, that borrowers are not required to disclose any extra information about their previous loans (term, size, contract breaches etc). Credit history length may also have a positive effect on borrower reliability, that is, credit history length may be expected to be positively correlated with interest rate premium.
10) Number of loan requests over the last six months
The number of times a user has sought to take out a loan in the last six months takes the value of 1 to 6 in increments of 1. This indicator may be analysed in terms of borrower reliability. Presumably, more frequent credit users have a hard time managing their income and expenses, which makes them a greater risk for the creditor, so that they should expect a higher interest rate and, by extension, a higher premium. Therefore the correlation between this parameter and the dependent variable is assumed to be a positive one.
11) Number of years since last delinquency
Within the given sample the number of years that have passed since the user’s last delinquency on a credit contract varies from 0 to 10 years. For a sample of 8 000 respondents this is a rather substantial figure. Still, in should be noted that the platform does not disclose what kinds of delinquencies get noted down. There is a possibility that this involves every kind of delinquency up to a single day’s delay, which may help explain the figure. Another explanation may lie in the fact that, according to the data, the majority of loans are taken out to pay off existing debts, which means these users find themselves unable or unwilling to finance their debts out of their personal savings, and, when the latter are lacking, often find themselves in breach of their credit obligations.
This parameter, just like the ‘age’ of disputes, litigations etc, may be viewed as a measure of borrower reliability. That is, a longer period since the last delinquency means a longer positive credit history, hence a greater degree of borrower reliability that translates into a lower interest rate and a smaller premium. The present analysis employs this criterion as the variable years_since_last_delinq. Its correlation with the dependent variable is expected to be negative or inverse.
12) Rate of revolving credit use
The rate of revolving credit use is calculated as a ratio of total proceeds of credit to total amount of revolving credit lines opened to a borrower. For the reviewed sample it is less than 0.5. Higher ratio means more active use of existing credit lines, higher liabilities, hence lower borrower reliability. So that yet another credit extended to such a borrower will have to have a higher interest rate and interest premium. Therefore the correlation between this parameter and the dependent variable is assumed to be positive.
5. Choice of model
The above data represents a set of socioeconomic parameters collected under relatively fixed conditions, a set of independent data sampled from the general population. This data, therefore, can be classified as crosssectional.
The ordinary least squares (OLS) method, provided that model preconditions are met, is a wellproven tool for analysing crosssectional data. Meeting these preconditions is necessary for the model to produce reliable results. Failing to do so leads to a bias in coefficient estimation and, consequently, to modelling and forecasting errors.
There are two major problems analysts working with crosssectional data have to face because of unfulfilled preconditions: multicollinearity and heteroscedasticity.
Multicollinearity occurs when the precondition of regressor independence is not met. One of the ways of identifying the problem is to build a correlation matrix. The rule of thumb is that multicollinearity occurs when the module’s correlation rate is higher than 0.7. However, this rule is not absolute, since, when other correlation rates converge to zero, a rate of (module) correlation of just 0.4 can cause multicollinearity. A more reliable way of testing for this problem is comparing what the model’s individual tstats and the group’s Fstat are saying about coefficient significances. A conflict between the two tests speaks of multicollinearity. This generally involves the Fstatistic indicating that the regression’s combined coefficients are significant, while individual tstatistics are saying their respective coefficients are not significant. The two most popular methods of solving the problem of multicollinearity are transforming or eliminating one of the variables involved, leaving the one that contributes the most to the model’s quality. The most common transformation is, when theoretically justified, to merge the correlated variables into a single parameter (relation, coefficient etc). Also widespread is the logarithmic transformation, which is also useful for normalising the data.
The present model reveals that several variables (form of residential property ownership, credit history length, employment, and loan purpose) have a potential to cause multicollinearity because of their high (but not critically so) correlation with other variables. Consequently, making the final conclusion requires analysing the results of the regression analysis.
Heteroscedasticity occurs when the condition of the standard deviations of regression residuals being constant is not met. The residual is the difference between the observed value of a dependent variable and its predicted (judging by the resulting coefficients) value. To test regression residuals it is necessary to make a residuals plot for each of the regressors. It is possible to say that no heteroscedasticity is present when the residuals are evenly distributed regardless of regressor values. Another precondition for the model to function properly is for the residuals’ expected mean to be 0. This is also easy to check on a scatter plot. In case of heteroscedasticity, there are two methods for solving the problem: calculating the statistics with adjusted standard errors and correcting the initial coefficients by using the generalised least squares method (GLS).
6. Full regression model analysis
The regression was built using 80 000 initial observations and 15 variables (8 of them dummy).
Regression analysis has shown that the model has an abnormally high rate of determination (R2≈98.82%). However, also present was the previously mentioned problem of discrepancy between group Ftest and individual ttests results. The overall significance test calls for the hypothesis that all regression coefficients are 0 to be rejected (Fstat Pvalue being less than 0.01), while individual tests of coefficient significance say that, applied to individual coefficients, the hypothesis cannot be rejected in more than half the number of cases (tstat’s Pvalue being greater than 0.1). Given this and the fact that a substantial degree of correlation was detected between the problem variables and the rest, one can be fully certain that multicollinearity is indeed present. Since there is no possibility of putting the offending variables through any sort of transformation, they have to be excluded from the model.
7. Abbreviated regression model analysis
Analysis of the abbreviated regression model shows that the problem of multicollinearity has indeed been successfully solved. Both tests (F and t) show all regression coefficients to have a high degree of significance (at the 5% level).
The model was further tested for heteroscedasticity by plotting the residuals for each of the regressors. The error distribution plot corresponds to that of white noise distribution (errors distributed independently of the regressors, with a mean of 0), which indicates absence of heteroscedasticity.
Out of the initial 12 parameters, the reduced regression model had 1 dependent variable and 6 regressors left.
The general regression equation is as follows:
The results allow making the conclusion that the correlations defined by the model coincide with the ones that had been determined analytically.
 Borrowers seeking 5year loans, all other things being equal, must expect a premium that is 0.000585 percentage points (0.0585%) higher than that of borrowers seeking 3year loans;
 A onestep adjustment in a user’s credit rating (e.g., being downgraded from A1 to A2), other things equal, triggers a change (a rise) in interest rate premium of 0.0065376 percentage points (0.654%);
 An increase of 1 in the number of loan requests within the current sixmonth period, all other things being equal, triggers a rise in interest rate premium of 0.000158 percentage points (0.0158%);
 An increase in the number of years since last credit delinquency, other things equal, triggers a fall in interest rate premium of 0.0000178 percentage points (0.00178%);
 A 0.1 increase in the rate of revolving credit use, other things equal, triggers a rise in interest rate premium of 0.000035 percentage points (0.0035%)*;
 A 0.1 increase in the borrower’s debt to income ratio, other things equal, triggers a rise in interest rate premium of 0.00012 percentage points (0.012%)**.
(*  Here the variable cannot change in increments of 1 because it is expressed in fractions and varies between 0 and 1. Hence the coefficient gets multiplied by 0.1 instead of 1)
(**  Here the variable cannot change in increments of 1 because it is expressed in fractions and varies between 0 and 1. Hence the coefficient gets multiplied by 0.1 instead of 1)
Indices with the highest degree of magnitude are credit rating and debt to income ratio. The one with the lowest degree of magnitude is the number of years since last delinquency.
Indices found to have no significance are employment, loan purpose, form of residential property ownership, and credit history length. This may be due to the fact that, as was mentioned earlier, LendingClub does not record these parameters with a sufficient degree of detail. As for loan purpose and form of residential property ownership, their lack of significance may be explained by a number of facts. Firstly, noone is checking each and every bit of selfreported information. Secondly, noone is checking what the borrower actually spends their money on. There is no system of accountability. Thirdly, the loan purpose parameter is supposed to reflect the size of the loan, but in this case all loans are capped at $35 000.
Annual income and loan size were merged into a single coefficient debt to annual income ratio.
Performing regression analysis on the model showed it to have a high rate of determination (R2≈98.82%). The model has a high predictive strength.
The model correctly predicts the size of the premium in 94% of cases. This was confirmed by testing the model on the data deliberately excluded from the sample. Diagram 2 clearly shows the results of this test in three graphs: the red one (Ypred.) is the premium the model calculated using the resulting regression equation. The green one (Yobs.) is the premium set by the platform and applied when crediting its users. The blue one (Error) is the forecasting error, the deviation between predicted and actual values. The graph shows it to be very small. The fact that the first two graphs are virtually identical indicates that the quality of the model is very high. The fact that the blue graph (Error) only marginally deviates from zero is a quantitative proof of the high probability of correct forecasts of the interest rate made using this model
Diagram 2. Test of model forecasting strength
Source: own calculations
8. Conclusions
To reduce the interest rate on a loan the borrower must have the highest credit rating, a small debt to annual income ratio, a minimal rate of revolving credit use, no incidents of credit delinquency during the last 10 years, make the smallest number of loan requests in the previos six months, and be applying for a loan with a 3year maturity.
9. References
 Bachmann, A., Becker, A., Buerckner, D., Hilker, M. Kock, M., Lehmann, M. and Tiburtius, P. “Online peertopeer lending  a literature review”, Journal of Internet Banking and Commerce, vol. 16, №2, 2011.
 Binjie Luo, Zhangxi Lin. A decision tree model for herd behavior and empirical evidence from the online P2P lending market. Inf Syst EBus Manage, 2013.
 Eric C. Chaffee, Geoffrey C Rapp. “Regulating Online PeertoPeer Lending in the Aftermath of DoddFrank: In Search of an Evolving Regulatory Regime for an Evolving Industry.” University of Toledo Legal Studies Research Paper No. 201204
 Freedman, S. and Jin, G. Z. “Do Social Networks Solve Information Problems for PeertoPeer Lending? Evidence from Prosper.com”, NET Institute Working Paper, Vol. 0843, November 2008.
 Frerichs, A., Schumann, M. Peer to Peer Banking  State of the Art. Gottingen, 2008
 Herzenstein, M., Andrews, R. L., Dholakia, U. M., Lyandres, E. (2008). The Democratization Of Personal Consumer Loans? Determinants Of Success In Online PeerToPeer Lending Communities. 2008.
 Hulme, M. K., Wright, C. “Internet Based Social Lending: Past, Present and Future”, Social Futures Observatory: London, October 2006.
 Iyer, R., Khwaja, A.I., Luttmer, E. F., Shue, K. “Screening peers softly: inferring the quality of small borrowers” NBER Working Paper No. 15242, August 2009.
 Loureiro, Y.K., Gonzalez, L. “When can a photo increase credit? The impact of lender and borrower profiles on online peertopeer loans”, International Journal of Bank Marketing, 2014.
 Moenninghoff, S. C., Wieandt, A. “The Future of PeertoPeer Finance”, Zeitschriftfür Betriebswirtschaftliche Forschung, August/September 2013  pp. 466487.
 SerranoCinca, C., GutiérrezNieto, B., LópezPalacios, L., “Determinants of Default in P2P Lending”, PLoS ONE, 2015.
 Slattery, P. “Square Pegs in a Round Hole: SEC Regulation of Online PeertoPeer Lending and CFPB Alternative”, Yale Journal on Regulation, vol. 30, 2013.
 Verstein, A. “The Misregulation of PersontoPerson Lending”, UC Davis Law Review, Vol. 45, No. 2, 2011.
 Wen, X., Wu, X. “An Analysis of Factors to Influence Successful Borrowing Rate in P2P Network LendingA Case Study of the Paipai Lending”, Finance Forum, No. 3, 2014 – pp. 38.
 Yandiev M. The Theory Of Finance: A Novel Finance Model Being Formed On The Internet. Russian Review (Русское обозрение), №1, 2015.  http://rusreview.com/journal/vol12015/31thetheoryoffinanceanovelfinancemodelbeingformedontheinternet.html
 Yang, Y. Analysis and Assessment of Credit rating model in P2P lending, An instrument to solve information asymmetry between lenders and borrowers / Master of Science in Management Studies / 08.05.2015 – MIT Sloan School of Management
 Zeng, R. “Legal Regulations in P2P Financing in the US and Europe”, USChina Law Review, vol. 10: 229, 2013 – pp. 229245.
 Zhang, Y., Yang, Z., Pan, H. “Influencing Factors of Online P2P Lending Success Rate in China” the National Natural Science Foundation of China (No.61309029, 61273293) and Ministry of Education Humanities Social Sciences Research Project (No.11YJC880163), 2015.
 Мальцев А.И. Моделирование кредитного риска при peertopeer кредитовании / Выпускная квалификационная работа / Пермь, 2014 г.
10. Appendices
Appendix 1
Table 1. Regressor pair correlation matrix
term  grade  inq_ last_ 6m 
years_ since_ last_ delinq 
revol_ util 
debt_ to_ income 
home_ own 
home_ rent 
credit_ history_ length 
employ ment 
purpose_ car 
purpose_ home 
purpose_ debt_ consoli dation 
purpose_ small_ business 
purpose_ cnsumer_ credit 

term  1  
grade  0.05  1  
inq_last_6m  0.01  0.22  1  
years_since_last_delinq  0.00  0.03  0.02  1  
revol_util  0.10  0.18  0.11  0.00  1  
debt_to_income  0.04  0.13  0.09  0.03  0.07  1  
home_own  0.02  0.01  0.00  0.01  0.04  0.04  1  
home_rent  0.10  0.01  0.03  0.64  0.03  0.01  0.02  1  
credit_history_length  0.03  0.00  0.44  0.00  0.00  0.00  0.01  0.00  1  
employment  0.53  0.02  0.01  0.02  0.04  0.00  0.02  0.17  0.00  1  
purpose_car  0.00  0.00  0.00  0.00  0.00  0.01  0.00  0.00  0.02  0.00  1  
purpose_home  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.03  0.00  0.02  1  
purpose_debt_consolidation  0.01  0.01  0.01  0.00  0.00  0.00  0.00  0.00  0.01  0.00  0.11  0.32  1  
purpose_small_business  0.00  0.01  0.00  0.00  0.00  0.00  0.01  0.00  0.30  0.00  0.01  0.02  0.12  1  
purpose_cnsumer_credit  0.00  0.00  0.00  0.01  0.00  0.00  0.00  0.00  0.01  0.00  0.05  0.15  0.57  0.06  1 
Appendix 2
Full regression model results
Model summary  
Multiple R  0.994077865 
Rsquare  0.988190802 
Adjusted Rsquare  0.988188587 
Std. error  0.004563083 
Observations  80 000 
ANOVA
df  SS  MS  F  F sig  
Regression  15  139.36  9.29  446 203  0.00 
Residual  79 984  1.67  0.00  
Total  79 999  141.03 
Coefficients  Std. error  tStatistic  Prob.  Lower 95%  Upper 95%  
const  0.052151  0.000123  423.784894  0.000000  0.051910  0.052392 
term  0.000572  0.000042  13.698514  0.000000  0.000490  0.000654 
grade  0.006537  0.000003  2145.478352  0.000000  0.006531  0.006543 
inq_last_6m  0.000160  0.000016  9.968949  0.000000  0.000129  0.000192 
years_since_last_delinq  0.000019  0.000009  2.151359  0.031451  0.000037  0.000002 
revol_util  0.000356  0.000074  4.818033  0.000001  0.000211  0.000500 
debt_to_income  0.001212  0.000165  7.355325  0.000000  0.001535  0.008891 
home_own  0.000027  0.000058  0.469873  0.638447  0.000086  0.000141 
home_rent  0.000095  0.000035  2.669838  0.007590  0.000025  0.000164 
credit_history_length  0.000002  0.000002  0.707263  0.479405  0.000003  0.000006 
employment  0.000027  0.000024  1.123822  0.261092  0.000075  0.000020 
purpose_car  0.000270  0.000199  1.356122  0.175064  0.000660  0.000120 
purpose_home  0.000118  0.000101  1.168020  0.242802  0.000080  0.000316 
purpose_debt_consolidation  0.000154  0.000079  1.935994  0.052872  0.000309  0.000002 
purpose_small_business  0.000042  0.000186  0.227046  0.820389  0.000406  0.000322 
purpose_cnsumer_credit  0.000064  0.000083  0.767950  0.442519  0.000226  0.000099 
Appendix 3
Abbreviated regression model results
Model summary  
Multiple R  0.898540619 
Rsquare  0.893217349 
Adjusted Rsquare  0.89320934 
Std. error  0.00 
Observations  80 000 
ANOVA
Coefficients  Std. error  tStatistic  Prob.  Lower 95%  Upper 95%  
const  0.052042  0.000082  636.1935  0.000000  0.051881  0.052202 
term  0.000585  0.000041  14.16905  0.000000  0.000504  0.000666 
grade  0.006538  0.000003  2155.373  0.000000  0.006532  0.006544 
inq_last_6m  0.000158  0.000016  9.839076  0.000000  0.000126  0.000189 
years_since_last_delinq  0.000018  0.000009  1.994834  0.046064  0.000035  0.000000 
revol_util  0.000348  0.000074  4.730443  0.000002  0.000204  0.000492 
debt_to_income  0.001204  0.000165  7.317657  0.000000  0.001526  0.002408 
Appendix 4
Residuals plots by regressor