-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathp2p-loans.html
342 lines (310 loc) · 34.3 KB
/
p2p-loans.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
<!DOCTYPE html>
<html>
<head>
<title>Interest rates on <span class="caps">P2P</span> loans - datawerk</title>
<meta charset="utf-8" />
<link href="https://buhrmann.github.io/theme/css/bootstrap-custom.css" rel="stylesheet"/>
<link href="https://buhrmann.github.io/theme/css/pygments.css" rel="stylesheet"/>
<link href="https://buhrmann.github.io/theme/css/style.css" rel="stylesheet" />
<link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
<link rel="shortcut icon" type="image/png" href="https://buhrmann.github.io/theme/css/logo.png">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<meta name="author" contents="Thomas Buhrmann"/>
<meta name="keywords" contents="datawerk, R,report,regression,"/>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-56071357-1', 'auto');
ga('send', 'pageview');
</script> </head>
<body>
<div class="wrap">
<div class="container-fluid">
<div class="header">
<div class="container">
<nav class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="https://buhrmann.github.io">
<!-- <span class="fa fa-pie-chart navbar-logo"></span> datawerk -->
<span class="navbar-logo"><img src="https://buhrmann.github.io/theme/css/logo.png" style=""></img></span>
</a>
</div>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<!--<li><a href="https://buhrmann.github.io/archives.html">Archives</a></li>-->
<li><a href="https://buhrmann.github.io/posts.html">Blog</a></li>
<li><a href="https://buhrmann.github.io/pages/cv.html">Interactive CV</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Data Reports<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<!--<li class="divider"></li>
<li class="dropdown-header">Data Science Reports</li>-->
<li >
<a href="https://buhrmann.github.io/p2p-loans.html">Interest rates on <span class="caps">P2P</span> loans</a>
</li>
<li >
<a href="https://buhrmann.github.io/activity-data.html">Categorisation of inertial activity data</a>
</li>
<li >
<a href="https://buhrmann.github.io/titanic-survival.html">Titanic survival prediction</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Data Apps<span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<!--<li class="divider"></li>
<li class="dropdown-header">Data Science Reports</li>-->
<li >
<a href="https://buhrmann.github.io/elegans.html">C. elegans connectome explorer</a>
</li>
<li >
<a href="https://buhrmann.github.io/dash+.html">Dash+ visualization of running data</a>
</li>
</ul>
</li>
</ul>
</div>
</nav>
</div>
</div><!-- header -->
</div><!-- container-fluid -->
<div class="container main-content">
<div class="row row-centered">
<div class="col-centered col-max col-min col-sm-12 col-md-10 col-lg-10 main-content">
<section id="content" class="article content">
<header>
<span class="entry-title-info">Oct 07 · <a href="https://buhrmann.github.io/category/reports.html">Reports</a></span>
<h2 class="entry-title entry-title-tight">Interest rates on <span class="caps">P2P</span> loans</h2>
</header>
<div class="entry-content">
<p>In this post I will look at linear regression to model the process determining interest rate on peer-to-peer loans provided by the <a href="https://www.lendingclub.com/home.action">Lending club</a>. Like other peer-to-peer services, the Lending Club aims to directly connect producers and consumers, or in this case borrowers and lenders, by cutting out the middleman. Borrowers apply for loans online and provide details about the desired loan as well their financial status (such as their <span class="caps">FICO</span> score). Lenders use the information provided to choose which loans to invest in. The Lending Club, finally, uses a <a href="https://www.lendingclub.com/public/how-we-set-interest-rates.action">proprietary algorithm</a> to determine the interest charged on an applicant’s loan. Given the secret nature of this process, a borrower or lender might be interested in which variables, beside the obvious <span class="caps">FICO</span> credit score, influence the final interest rate and how strong this influence is. It is the aim of this analysis to identify such associations. The goal of the regression modelling is therefore more inferential than predictive. </p>
<figure>
<img src="/images/p2ploans/lendingclub.jpg" alt="Fico score analysis"/>
</figure>
<h3>Methods</h3>
<p>Data about peer-to-peer loans issued through the Lending Club was provided by the Data Analysis class on Coursera (<a href="https://spark-public.s3.amazonaws.com/dataanalysis/loansData.rda">file here</a>). The set used in this analysis was downloaded on the 17th of February, 2013. After accounting for missing values in the data, an exploratory analysis was performed to identify variables that required transformation prior to statistical modeling (boxplots, histograms etc.), and to find a subset of variables to be used in a regression model relating interest rate to an applicant’s <span class="caps">FICO</span> score (using correlation analysis and <span class="caps">PCA</span>). The statistical model itself was a simple linear multivariate regression [1]. Since most variables were not normally distributed, results were also compared to robust estimation techniques.</p>
<p>To reproduce the results of this report the complementary R script (provided on github) can be run with the corresponding data file.</p>
<h3>Summary of data</h3>
<p>Besides a loan’s interest rate, the data set analyzed here contains information about the amount requested and the amount eventually funded by investors (1000$-35000$), the length of the loan (36 or 60 months), and its purpose (the majority went towards debt consolidation or to pay off credit cards). Information about the applicant included his or her duration of employment, state of residence, information about home ownership (the majority either renting or having a mortgage), debt to income ratio, monthly income, <span class="caps">FICO</span> score, number of open credit lines, revolving credit balance, and the number of inquiries made in the last 6 months. In total the set included information about 2500 loans, out of which 79 contained missing data (most in employment length). Given this relatively small number, and the relatively large set, the corresponding data was simply discarded.</p>
<p><span class="caps">FICO</span> scores were transformed from a 38-level factor (the lowest being [640-644] and the highest [830-834]) to mean values for each range. The number of recent loan inquiries showed a Poisson- or exponential-like distribution. Since the kinds of analyses performed on the data—such as linear regression—might be sensitive to data not being normally distributed, we created a new factor variable with only two levels: 0 inquiries (1213 data points) and 1-9 inquiries (1208 data points).</p>
<p>Histograms of quantitative variables indicated that the distributions of monthly incomes, <span class="caps">FICO</span> score, revolving credit balance and number of open credit lines were not normal, and more specifically right-skewed to different degrees. While log<sub>10</sub> transformation generally brought the distributions closer to normal (at least visually), the results of a Shapiro test [2] indicated that the null-hypothesis of normal distribution still needed to be rejected. Inspection of normal <span class="caps">QQ</span>-plots, however, indicated fairly normal distributions across a wide range centered around the means of most variables. While removal of outliers (e.g. monthly incomes greater than 4 or 5 standard deviations from mean) further improved normality, in the remaining analysis the whole data set was used.</p>
<p>The histogram of the interest rate variable showed a bimodal distribution, suggestive of a superposition of two separate distributions (see Figure 1, left). While none of the factors separated these, when interest rate histograms were plotted for different ranges of the <span class="caps">FICO</span> variable (by cutting the latter at quantiles) two more normal-like distributions could be identified (Figure 1, right). The shift in interest rate mean as a function of <span class="caps">FICO</span> score indicates that for very high <span class="caps">FICO</span> scores, relatively low interest rates become disproportionally more probable.</p>
<figure>
<img src="/images/p2ploans/IntRateVsFico.png" alt="Interest rate vs. Fico"/>
<figcaption class="capCenter">Figure 1: Distribution of interest rates. Left: overall distribution. Right: distribution separated into loans for applicants with <span class="caps">FICO</span> scores smaller than the 3rd quartile of 727 points (blue), and those with higher <span class="caps">FICO</span> scores (red).</figcaption>
</figure>
<h3>Exploratory analysis</h3>
<p>Since intuitively we know that interest rate should correlate strongly with <span class="caps">FICO</span> scores, we examined the associations between the two, as well as those between either and third variables.</p>
<p>Box plots of interest rate and <span class="caps">FICO</span> scores by factor variables showed that only two factors seemed to influence these variables. Loan length had a significant effect on interest rate (p-value of t-test < 2.2e<sup>-16</sup>, effect size of 4.24%), but not on <span class="caps">FICO</span> scores (p-value=0.41). The number of inquiries (two-level factor) had significant effects on the means of both variables (p < 2.2e<sup>-16</sup> and p = <sup>1.6e-5</sup>), but the effect size was interesting only in the case of interest rate (1.6%). The latter effect comes at no surprise, given that a previous inquiry indicates previous rejection of credit application, itself a sign of reduced credit-worthiness. The effect of loan length and number of credit inquiries on interest rates is summarized in Figure 2 below.</p>
<figure>
<img src="/images/p2ploans/IntRateFactorBoxes.png" alt="Interest rate vs. loan length and number of inquiries"/>
<figcaption class="capCenter">Figure 2: Boxplots of interest rate vs. loan length and the number of previous credit inquiries. </figcaption>
</figure>
<p>Using their correlation matrix, as well as pair-wise scatter plots with linear models fitted, we aimed to reduce the set of quantitative variables by discarding those (except interest rate and <span class="caps">FICO</span> score) that showed high correlations amongst themselves. The correlation of quantitative predictors is shown in Figure 3.</p>
<figure>
<img src="/images/p2ploans/cormat.png" alt="Correlation matrix of quantiative predictors."/>
<figcaption class="capCenter">Figure 3: Correlation matrix of quantiative predictors. Color codes for Pearson correlation between variables, which is also presented numerically. </figcaption>
</figure>
<p>First we eliminated the amount funded by investors. This is justified as a) we are interested here only in the resulting interest rate, not the size of the eventual loans, and b) for most applications the eventual loan equalled the amount requested, i.e. a strong linear relationship (with slope 1) existed between the two (linear regression resulted in adjusted coefficient of determination R<sup>2</sup>=0.94, i.e. approx. 94% variance explained).</p>
<p>The correlation matrix further showed an association between monthly income and loan amount requested (Pearson correlation R = 0.47; linear regression with adj. R<sup>2</sup> = 0.23 and p < 2.2e<sup>-16</sup>). This is not surprising as we would expect people with higher incomes to be able to afford larger loans. To avoid confounders we also rejected monthly income as an independent variable in further models.</p>
<p>Of the remaining four covariates (excluding <span class="caps">FICO</span>), three are related to the applicant’s current debt. In particular, correlation analysis revealed that the number of open credit lines correlates (relatively weakly) with both debt to income ratio (R = 0.38) and revolving credit balance (0.34). We therefore use only the first to stand in for the overall debt burden.</p>
<p>In summary, we consider as quantitative covariates in the statistical model only the amount requested, open credit lines and the <span class="caps">FICO</span> score. A <span class="caps">PCA</span> analysis confirms that these are indeed relatively independent (see Figure 1). The variable for open credit lines is aligned mostly with the first principal component, <span class="caps">FICO</span> score with the second, and the amount requested falls in between the other two (i.e. they are not orthogonal, but far from parallel in the space of the principal components).</p>
<figure>
<img src="/images/p2ploans/pca.png" alt="PCA"/>
<figcaption class="capCenter">Figure 1: Biplot of principal component analysis of the data set containing only the selected three covariates. Data points (color-coded by the loan length factor) and covariate directions are plotted in the space of the first two principal components. Ellipses contain with 68% probability the points belonging to each level of the loan length. </figcaption>
</figure>
<p>In addition we see that loans of different duration are slightly but significantly separated in <span class="caps">PCA</span> space. Longer loans can usually be found in the direction of larger amounts and higher <span class="caps">FICO</span> score, for example. A <span class="caps">PCA</span> using all covariates (not shown) confirms our above finding that monthly income and the amount requested largely capture the same variance in the data (the projections in <span class="caps">PCA</span> space are almost parallel), further justifying our exclusion of the former. Equally, the number of open credit lines approx. captures the same variance as the revolving credit balance. Finally, debt to income ratio in <span class="caps">PCA</span> space is almost parallel but points in the opposite direction from <span class="caps">FICO</span> score, indicating that the former is probably a strong determining factor in the computation of the latter and can therefore also be excluded without losing much of the observed variance.</p>
<p>The exploratory analysis can be summarised by the following relationships between interest rate and retained covariates:</p>
<p>Factors:<ul>
<li>The longer the loan, the higher the interest.</li>
<li>The more often an applicant has recently inquired about a loan, the higher its interest.</li>
</ul></p>
<p>Quantitative:<ul>
<li>The larger the loan, the higher its interest.</li>
<li>The smaller the <span class="caps">FICO</span> score, the higher the interest.</li>
<li>The higher the number of open credit lines, the higher the interest.</li>
</ul></p>
<p>In the following section we aim to quantify these associations in more detail using statistical modeling.</p>
<h3>Statistical Modeling</h3>
<p>As a first test, a simple linear regression was performed relating interest rate to log<sub>10</sub>- transformed <span class="caps">FICO</span> scores only, as these two variables exhibited the greatest correlation (R=0.71). Analysis of the residuals showed non-random patterns as a function of both the requested amount and loan length, but not the number of open credit lines or number of recent inquiries. We therefore chose to include the first two as potential confounders in a more complicated model:</p>
<script type="math/tex; mode=display">
IR = LL_{36} + b_1 log_{10}(FICO) + b_2(Amount) + b_3(LL_{60}) + e
</script>
<p>where <span class="caps">IR</span> is the interest rate; the intercept <span class="caps">LL</span><sub>36</sub> corresponds to the estimated mean of interest rates for loans over a 36 months period given a <span class="caps">FICO</span> score of 1 (log<sub>10</sub>(1)=0); b<sub>1</sub> represents the change in interest rate associated with a change of 1 unit in log<sub>10</sub> <span class="caps">FICO</span> score for loans of the same amount; b<sub>2</sub> captures the change in interest rate as a function of the loan amount requested; b<sub>3</sub> the increase in interest rate of loans lasting 60 rather than 36 months; and e are unmodelled random variations. A scatterplot matrix illustrates the relationship between these covariates (Figure 2, left panel).</p>
<figure>
<img src="/images/p2ploans/fico-figure.jpg" alt="Fico score analysis"/>
<figcaption class="capCenter">Figure 2: Covariates used in final regression model (left panel) and its residuals (right panel). Left: scatter plot matrix of covariates color-coded according to loan length (gray: 36 months; black: 60 months). Longer loans tend to have higher interest rates and correspond to larger loans. Green lines indicate univariate regression lines, supporting the observed trends. Above the diagonal, Pearson correlations are shown. Besides the association between interest rate and <span class="caps">FICO</span> score (here log10-transformed) as well as amount requested, it can be seen that the latter two are not associated with each other (R=0.091). Right: residuals after fitting a univariate regression using only <span class="caps">FICO</span> score (top row), and residuals when using the full model (bottom row). In the left column residuals are color-coded with respect to the levels of the loan length factor, and in the right column according to four different levels of the loan amount variable. The non-random patterns of the univariate model vanish in the full model.</figcaption>
</figure>
<p>As can be be seen in the right panel of Figure 2, the full model largely removes the patterns in residuals observed in the simple model. Further analysis also reveals that the residuals are approximately normal in distribution (see Figure 3), indicating that at least this assumption of the linear regression approximately holds. </p>
<figure>
<img src="/images/p2ploans/residuals.png" alt="Linear regression residuals"/>
<figcaption class="capCenter">Figure 3: Histogram and density of residuals for the simple univariate linear regression (left) and the full model (right). Clearly, the residuals of the full model are close to normal distributed, but not those of the simpler model.</figcaption>
</figure>
<p>Whereas the simple model accounted for only 50.65% of the variance in the data, the full model captured 75.18% (R<sup>2</sup>=0.75, the significance of associations being p < 2.2e<sup>-16</sup>). The added covariates did not however change the sign nor much the value of the resulting coefficients (<span class="caps">LL</span><sub>36</sub>=427.3, b<sub>1</sub>=-146.2, b<sub>2</sub>=0.00014, b<sub>3</sub>=3.3). The coefficient measuring the influence of the Fico score, for example, changed by 4% only. We also checked whether inclusion of the number of open credit lines, the number of recent inquiries, or the debt to income ratio in the model would improve the result even further. But the change in explained variance was small (+2.2%) after adding these measures of debt burden. Again coefficients did not change significantly either, indicating that the additional variables do not act as confounders of <span class="caps">FICO</span> score, which was a possibility since debt burden is already taken into account in the determination of the <span class="caps">FICO</span> score. Allowing for interactions between covariates in the full model also did not result in better fit. Since the distributions of most variables in the data set were not perfectly normal even after transforming (see histograms in Figure 2), we also tested whether non-linear and robust forms of regression would perform better. But neither a generalized linear model (glm in R [2]) nor a robust regression using an M-estimator (rlm in R [3]) produced significantly different coefficients.</p>
<p>In the final model, a change of one unit in log<sub>10</sub> <span class="caps">FICO</span> score corresponds to a change of -146.2% (95% confidence interval: -142,-152) in interest rate over the base rate of 427.3% (<span class="caps">CI</span>: 416, 438). So for example, the interest rate for a 10000$ loan over 36 months would be 12.1% for the average reported <span class="caps">FICO</span> score of 707, and would decrease by 0.89% for an additional 10 points of the <span class="caps">FICO</span> score. An increase in the size of the loan by 1000$ corresponds to an increase of 0.14% in interest rate (<span class="caps">CI</span> of b<sub>2</sub>: 0.00013, 0.00015). Increasing the length of the loan to 60 months would result in an additional 3.3% of interest (<span class="caps">CI</span>: 3.1, 3.52).</p>
<h3>Conclusions</h3>
<p>Our results show a significant negative association between interest rates and <span class="caps">FICO</span> score, modulated by positive associations with loan amount and duration. Due to the log<sub>10</sub> transformation, model estimates are non-linear with respect to <span class="caps">FICO</span> score. Though this makes interpretation of the model less intuitive, the associations are in the direction one would expect.</p>
<p>It should be kept in mind that the analysis only applies to the peer-to-peer loans issued through the Lending Club. Loans in general, e.g. those offered by traditional banks, might follow a different pattern. While the model presented here would allow interested individuals to get an idea of what interest rates to expect given a desired loan and credit history, further work would be needed if accurate predictions are required (e.g. when evaluating what kind of loans to include in a lender’s portfolio). Also, the data analyzed here does not serve to verify whether peer-to-peer loans provided through the Lending Club are indeed cheaper than those offered by banks, as the website claims. </p>
<p>The main goal of the regression modelling was inferential, identifying significant associations between dependent variables describing the loan applicant and resulting interest rate. In this regard, we have found that <span class="caps">FICO</span> score, loan length and requested amount are not confounders, even though it is certainly true that applicants with better <span class="caps">FICO</span> score can afford, and often will apply for larger loans. Also, additional measures of debt burden do not ncessarily act as confounders for <span class="caps">FICO</span> score, even if they’re likely to be determining factors in the calculation of the latter (if the goal was accurate prediction of interest only, inclusion of these additional variables would likely lead to small increases in performance though). More work would have to be done to provide a complete causal story (such as randomized tests, or stratified analysis) for the determination of interest rate.</p>
<p>As noted, none of the variables in the data set were normally distributed, something that could not be remedied by log transformation (or indeed other transforms, such as square roots etc.). It is possible that other non-linear or robust methods would have been more appropriate. Nevertheless, residuals after linear regression, according to histograms and <span class="caps">QQ</span>-plots, were approximately normal, indicating that a linear regression might not have been totally unwarranted.</p>
<h3>References</h3>
<ol class="bib">
<li>Bishop, <span class="caps">C. M.</span>(2006). Pattern recognition and machine learning (Vol. 4, No. 4). New York: Springer.</li>
<li>Wood, Simon (2006). Generalized Additive Models: An Introduction with R. Chapman <span class="amp">&</span> Hall/<span class="caps">CRC</span>.</li>
<li>J. Huber (1981) Robust Statistics. Wiley.</li>
</ol>
<p></script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script></p>
</div><!-- /.entry-content -->
<footer class="post-info">
Published on <span class="published">October 07, 2017</span><br>
Written by <span class="author">Thomas Buhrmann</span><br>
Posted in <span class="label label-default"><a href="https://buhrmann.github.io/category/reports.html">Reports</a></span>
~ Tagged
<span class="label label-default"><a href="https://buhrmann.github.io/tag/r.html">R</a></span>
<span class="label label-default"><a href="https://buhrmann.github.io/tag/report.html">report</a></span>
<span class="label label-default"><a href="https://buhrmann.github.io/tag/regression.html">regression</a></span>
</footer><!-- /.post-info -->
</section>
<div class="blogItem">
<h2>Comments</h2>
<div id="disqus_thread"></div>
<script type="text/javascript">
var disqus_shortname = 'datawerk';
var disqus_title = 'Interest rates on <span class="caps">P2P</span> loans';
var disqus_identifier = "p2p-loans.html";
(function() {
var dsq = document.createElement('script');
dsq.type = 'text/javascript';
dsq.async = true;
//dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] ||
document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
<noscript>
Please enable JavaScript to view the
<a href="http://disqus.com/?ref_noscript=datawerk">
comments powered by Disqus.
</a>
</noscript>
</div>
</div>
</div><!-- row-->
</div><!-- container -->
<!-- <div class="push"></div> -->
</div> <!-- wrap -->
<div class="container-fluid aw-footer">
<div class="row-centered">
<div class="col-sm-3 col-sm-offset-1">
<h4>Author</h4>
<ul class="list-unstyled my-list-style">
<li><a href="http://www.ias-research.net/people/thomas-buhrmann/">Academic Home</a></li>
<li><a href="http://github.com/synergenz">Github</a></li>
<li><a href="http://www.linkedin.com/in/thomasbuhrmann">LinkedIn</a></li>
<li><a href="https://secure.flickr.com/photos/syngnz/">Flickr</a></li>
</ul>
</div>
<div class="col-sm-3">
<h4>Categories</h4>
<ul class="list-unstyled my-list-style">
<li><a href="https://buhrmann.github.io/category/academia.html">Academia (4)</a></li>
<li><a href="https://buhrmann.github.io/category/data-apps.html">Data Apps (2)</a></li>
<li><a href="https://buhrmann.github.io/category/data-posts.html">Data Posts (9)</a></li>
<li><a href="https://buhrmann.github.io/category/reports.html">Reports (3)</a></li>
</ul>
</div>
<div class="col-sm-3">
<h4>Tags</h4>
<ul class="tagcloud">
<li class="tag-4"><a href="https://buhrmann.github.io/tag/shiny.html">shiny</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/networks.html">networks</a></li>
<li class="tag-3"><a href="https://buhrmann.github.io/tag/sql.html">sql</a></li>
<li class="tag-3"><a href="https://buhrmann.github.io/tag/hadoop.html">hadoop</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/mongodb.html">mongodb</a></li>
<li class="tag-1"><a href="https://buhrmann.github.io/tag/visualization.html">visualization</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/smcs.html">smcs</a></li>
<li class="tag-3"><a href="https://buhrmann.github.io/tag/sklearn.html">sklearn</a></li>
<li class="tag-3"><a href="https://buhrmann.github.io/tag/tf-idf.html">tf-idf</a></li>
<li class="tag-1"><a href="https://buhrmann.github.io/tag/r.html">R</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/sna.html">sna</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/nosql.html">nosql</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/svm.html">svm</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/java.html">java</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/hive.html">hive</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/scraping.html">scraping</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/lda.html">lda</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/kaggle.html">kaggle</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/exploratory.html">exploratory</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/titanic.html">titanic</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/classification.html">classification</a></li>
<li class="tag-1"><a href="https://buhrmann.github.io/tag/python.html">python</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/random-forest.html">random forest</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/text.html">text</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/big-data.html">big data</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/report.html">report</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/regression.html">regression</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/graph.html">graph</a></li>
<li class="tag-2"><a href="https://buhrmann.github.io/tag/d3.html">d3</a></li>
<li class="tag-3"><a href="https://buhrmann.github.io/tag/neo4j.html">neo4j</a></li>
<li class="tag-4"><a href="https://buhrmann.github.io/tag/flume.html">flume</a></li>
</ul>
</div>
</div>
</div>
<!-- JavaScript -->
<script src="https://code.jquery.com/jquery-2.1.1.min.js"></script>
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
<script type="text/javascript">
jQuery(document).ready(function($)
{
$("div.collapseheader").click(function () {
$header = $(this).children("span").first();
$codearea = $(this).children(".input_area");
$codearea.slideToggle(500, function () {
$header.text(function () {
return $codearea.is(":visible") ? "Collapse Code" : "Expand Code";
});
});
});
// $(window).resize(function(){
// var footerHeight = $('.aw-footer').outerHeight();
// var stickFooterPush = $('.push').height(footerHeight);
// $('.wrap').css({'marginBottom':'-' + footerHeight + 'px'});
// });
// $(window).resize();
// $(window).bind("load resize", function() {
// var footerHeight = 0,
// footerTop = 0,
// $footer = $(".aw-footer");
// positionFooter();
// function positionFooter() {
// footerHeight = $footer.height();
// footerTop = ($(window).scrollTop()+$(window).height()-footerHeight)+"px";
// console.log(footerHeight, footerTop);
// console.log($(document.body).height()+footerHeight, $(window).height());
// if ( ($(document.body).height()+footerHeight) < $(window).height()) {
// $footer.css({ position: "absolute" }).css({ top: footerTop });
// console.log("Positioning absolute");
// }
// else {
// $footer.css({ position: "static" });
// console.log("Positioning static");
// }
// }
// $(window).scroll(positionFooter).resize(positionFooter);
// });
});
</script>
</body>
</html>