Guest Blog from Brian Fannin, ACAS
It started even before I knew what R was. In 2007, a paper appeared in the P&C journal Variance, which focused on a different way of thinking about loss reserve estimation. This approach was founded on linear modelling - the LINEST function in Excel.
Inspired, I began entering data and formulas into a brand new spreadsheet. I created a design matrix for my data. This was a bit of trouble, but nothing I couldn't handle. I created a model, taking care to ensure that I had just as many columns as parameters for my model. I created another model on a subset of my data. This was a bit of work and meant that I could not disturb the data resting purposefully in their cells. No matter, it was a manual copy and paste that had gotten them there in the first place. When I added more observations to see their effect on my models, I saw that I'd either have to create new copies of them, or alter the existing formulas.
I'd had enough. Excel would return diagnostics about a statistical model, but I could tell that it had a greater interest in enabling changes to font size or attaching a file to an e-mail. I set myself the task of learning how to apply a linear model to P&C loss reserves in R and allowed myself a week of time to explore it. When the week was up, I quickly realized that I'd be spending a lot more time with R. I felt this despite having spent hours making every naive mistake, thick-headed blunder, or careless error. R didn't care about font size. It did care about proper capitalization. Excel had also corrected that for me.
So, why the instant love for R? For one, the mistakes were largely of my own making. I had assumed that I had to create everything I needed, just as I had with spreadsheets and VBA. I can remember many lines of code which would create a design matrix for my sample. Well, of course R would never force you to get your hands dirty creating your own design matrix. This was a natural consequence of matching data with a linear model. And the formula for the model - something like "Y ~ 1 + PredictorA + PredictorB" - was so concise and expressive, I could barely get over it. And this was just the beginning! I finally had a computational engine that would let me work with GLMs, neural networks, decision trees, cluster analysis, Bayesian inference and so much more.
I carried on with my work on linear models in P&C reserving. About a year later, I wrote a package which contained the (ongoing) results of that work and in 2014, I presented it to the R in Insurance conference in London. Earlier that year, I spent a week in Kigali, Rwanda teaching R at the Rwandan Biomedical Center. Quite a journey from my earliest experiments with R!
And the journey continues. I'll be sharing some of my experience with Machine Learning in R for Actuaries on April 12. Please join me!
R is an open source programming language and software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians, data miners, and actuaries for developing statistical software and data analysis. For more information, visit https://www.r-project.org/.
Brian Fannin, ACAS, is the founder and captain of PirateGrunt LLC, an actuarial, data and predictive modelling consultancy. Brian has held a number of positions at both primary and excess insurance companies, both in the US and overseas. He is the chair of the CAS open source software committee, with a focus on R. His principle areas of research are in stochastic reserving, predictive modelling and visualization of data. He is the author of the MRMR, raw, rerprestools and Imaginator R packages.