Skip to contents

This data set is taken from the dataCar data set of the insuranceData package and slightly adjusted (see the code in examples for reproducing this data set). The original data set is based on one-year vehicle insurance policies taken out in 2004 or 2005. There are 67566 policies, of which 4589 (6.8%) had at least one claim.

Usage

data(dataCar)

Format

A data frame with 67566 observations on the following 15 variables.

veh_value

vehicle value, in $10,000s

exposure

0-1

clm

occurrence of claim (0 = no, 1 = yes)

numclaims

number of claims

claimcst0

claim amount (0 if no claim)

veh_body

vehicle body, coded as BUS CONVT COUPE HBACK HDTOP MCARA MIBUS PANVN RDSTR SEDAN STNWG TRUCK UTE

veh_age

1 (youngest), 2, 3, 4

gender

a factor with levels F M

area

a factor with levels A B C D E F

agecat

1 (youngest), 2, 3, 4, 5, 6

X_OBSTAT_

a factor with levels 01101 0 0 0

Y

the loss ratio, defined as the number of claims divided by the exposure

w

the exposure, identical to exposure

VehicleType

type of vehicle, common vehicle or uncommon vehicle

VehicleBody

vehicle body, identical to veh_body

Details

Adjusted data set dataCar, where we removed claims with a loss ratio larger than 1 000 000. In addition, we summed the exposure per vehicle body and removed those where the summed exposure was less than 100. Hereby, we ensure that there is sufficient exposure for each vehicle body category.

Source

http://www.acst.mq.edu.au/GLMsforInsuranceData

References

De Jong P., Heller G.Z. (2008), Generalized linear models for insurance data, Cambridge University Press

Examples

  # How to construct the data set using the original dataCar data set from the insuranceData package
  library(plyr)
#> 
#> Attaching package: 'plyr'
#> The following object is masked from 'package:actuaRE':
#> 
#>     is.formula
  library(magrittr)
  data("dataCar", package = "insuranceData")
  dataCar$Y = with(dataCar, claimcst0 / exposure)
  dataCar$w = dataCar$exposure
  dataCar   = dataCar[which(dataCar$Y < 1e6), ]
  Yw = ddply(dataCar, .(veh_body), function(x) c(crossprod(x$Y, x$w) / sum(x$w), sum(x$w)))
  dataCar = dataCar[!dataCar$veh_body %in% Yw[Yw$V2 < 1e2, "veh_body"], ]
  dataCar$veh_body %<>% droplevels()
  dataCar$VehicleType = sapply(tolower(dataCar$veh_body), function(x) {
    if(x %in% c("sedan", "ute", "hback"))
      "Common vehicle"
    else
      "Uncommon vehicle"
  })
  dataCar$VehicleBody = dataCar$veh_body