1 Introduction

This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would. Our inspiration is the open source ModernDive book (Ismay and Kim 2018), with many tweaks, additions and changes by Katrien Antonio and Bavo DC Campo. This text book is primarily written for actuarial students as well as practitioners, but is of course not limited to the latter group.

  • We get started with R in Chapter 2: R vs RStudio, coding in R, installing and loading R packages, the references used in this book.
  • Thereafter, we look at different types of data and objects in R, including vectors, matrices, data frames and lists in Chapter 3.
  • We get started with data in Chapter 4.
  • Data visualisation is the focus of Chapter 5.
  • More on data wrangling in Chapter 6.
  • As probability distributions are of special importance to actuaries, these are discussed in Chapter 7.
  • Using and writing functions is the topic of Chapter 8.
  • Optimization tools help to optimize non straightforward likelihoods as discussed in Chapter 9.
  • First examples of model building focus on linear and generalized linear models in Chapters 10 and 11
  • References follow in 12.

1.1 Learning outcomes

By the end of this book, you should have mastered the following concepts

  1. How R can be used as an environment for data handling, visualization, analysis and programming.
  2. How to use R to to import/export data, to explore and manipulate data, to create insightful graphics and to write functions.
  3. How to find help in the ‘R community’, including finding examples of coding, books, support.
  4. How to perform simple tasks with R and how to look for more advanced tasks, further learning with specific packages.
  5. How to answer actuarial questions related to pricing and reserving.
  6. How to effectively create “data stories” using these tools.

This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression models.


1.2 Data/science pipeline

Within the data analysis field there are many sub-fields that we will discuss throughout this book (though not necessarily in this order):

  • data collection
  • data wrangling
  • data visualization
  • data modeling
  • interpretation of results
  • data communication/storytelling

These sub-fields are summarized in what Grolemund and Wickham term the “data/science pipeline” in Figure 1.1.

Data/Science Pipeline

Figure 1.1: Data/Science Pipeline


1.3 Inspirations and references

In essence, this book combines my own research papers (Katrien Antonio) and course notes with many useful quotes and examples from my favourite R books listed below.

This book is very much inspired by the following books or courses:

  • “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011),
  • “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and
  • “R for Data Science” (Grolemund and Wickham 2016),
  • “Moderndive” (Ismay and Kim 2018),
  • Jared Lander’s “R for everyone” (Lander 2017)
  • “Applied Econometrics with R” (Kleiber and Zeileis 2008)
  • “An Introduction to Statistical Learning” (James et al. 20AD)
  • all the work of Michael Clark, see Michael Clark’s website
  • many, many courses on the DataCamp platform, including Katrien Antonio and Roel Verbelen’s Valuation of Life Insurance Products in R.

1.4 About this book

This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2020). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub.

Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.

Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!” So, that is exactly what Katrien Antonio did!