# data Car

`dataCar.Rd`

This data set is taken from the `dataCar`

data set of the `insuranceData`

package and slightly adjusted (see the code in examples for reproducing this data set).
The original data set is based on one-year vehicle insurance policies taken out in 2004 or 2005. There are 67566 policies, of which 4589 (6.8%) had at least one claim.

## Usage

`data(dataCar)`

## Format

A data frame with 67566 observations on the following 15 variables.

`veh_value`

vehicle value, in $10,000s

`exposure`

0-1

`clm`

occurrence of claim (0 = no, 1 = yes)

`numclaims`

number of claims

`claimcst0`

claim amount (0 if no claim)

`veh_body`

vehicle body, coded as

`BUS`

`CONVT`

`COUPE`

`HBACK`

`HDTOP`

`MCARA`

`MIBUS`

`PANVN`

`RDSTR`

`SEDAN`

`STNWG`

`TRUCK`

`UTE`

`veh_age`

1 (youngest), 2, 3, 4

`gender`

a factor with levels

`F`

`M`

`area`

a factor with levels

`A`

`B`

`C`

`D`

`E`

`F`

`agecat`

1 (youngest), 2, 3, 4, 5, 6

`X_OBSTAT_`

a factor with levels

`01101 0 0 0`

`Y`

the loss ratio, defined as the number of claims divided by the exposure

`w`

the exposure, identical to

`exposure`

`VehicleType`

type of vehicle,

`common vehicle`

or`uncommon vehicle`

`VehicleBody`

vehicle body, identical to

`veh_body`

## Details

Adjusted data set `dataCar`

, where we removed claims with a loss ratio larger than 1 000 000. In addition, we summed the exposure per vehicle body and removed those where
the summed exposure was less than 100. Hereby, we ensure that there is sufficient exposure for each vehicle body category.

## References

De Jong P., Heller G.Z. (2008), Generalized linear models for insurance data, Cambridge University Press

## Examples

```
# How to construct the data set using the original dataCar data set from the insuranceData package
library(plyr)
#>
#> Attaching package: 'plyr'
#> The following object is masked from 'package:actuaRE':
#>
#> is.formula
library(magrittr)
data("dataCar", package = "insuranceData")
dataCar$Y = with(dataCar, claimcst0 / exposure)
dataCar$w = dataCar$exposure
dataCar = dataCar[which(dataCar$Y < 1e6), ]
Yw = ddply(dataCar, .(veh_body), function(x) c(crossprod(x$Y, x$w) / sum(x$w), sum(x$w)))
dataCar = dataCar[!dataCar$veh_body %in% Yw[Yw$V2 < 1e2, "veh_body"], ]
dataCar$veh_body %<>% droplevels()
dataCar$VehicleType = sapply(tolower(dataCar$veh_body), function(x) {
if(x %in% c("sedan", "ute", "hback"))
"Common vehicle"
else
"Uncommon vehicle"
})
dataCar$VehicleBody = dataCar$veh_body
```