- Home
- Rates
- Mortality
- Nuptiality
- Fertility
- Projections

- Projections

Home Rates Mortality Nuptiality Fertility Projections

## The Lee-Carter Model

We apply the Lee-Carter approach to forecasting U.S. mortality, doingall calculations from first principles. This log is best read inconjunction with the class handout.

Lee and Carter use U.S. mortality rates for conventional 5-year agegroups up to 85+ by single calendar years from 1933 to 1987, for bothsexes.

The data can be obtained from mortality.org. Register for anaccount, then select U.S.A., navigate to the row for life tables, total,and the column for 5x1. Save the life table as a text file`bltper_5x1.txt`

. This update uses the version last modifiedon 02 Jun 2022.

The code below shows the steps I followed after saving the web pageas `bltper_5x1.txt`

. We need to code age from the grouplabels and, more importantly, compute the rate for the open-ended agegroup 85+. For fitting purposes we select ages up to 85 and the years upto 1987, but later use more ages and years.

- Stata
- R

. import delimit using bltper_5x1.txt, rowrange(3) varnames(3) ///> delimit(" ", collapse) clear(encoding automatically selected: ISO-8859-1)(11 vars, 2,112 obs). replace age = "110" if age == "110+"(88 real changes made). gen pos = strpos(age, "-"). replace age = substr(age, 1, pos - 1) if pos > 0(1,936 real changes made). rename age age_s. gen age = real(age_s) . save us1x5, replace // for later usefile us1x5.dta saved. replace mx = lx/tx if age == 85(88 real changes made). keep year age mx. keep if age <= 85 & year <= 1987(1,067 observations deleted)

> library(dplyr)> library(stringr)> library(tidyr)> us <- read.table("bltper_5x1.txt", skip = 2, header=TRUE)> us <- mutate(us, age0 = as.numeric(str_extract(us$Age, "[0-9]+")),+ age = ifelse(age0 >= 5, age0 + 2.5, ifelse(age0 > 0, 3, 0.5)))> us85 <- mutate(us, mx = ifelse(Age == "85-89", lx/Tx, mx))> usmx <- select(us85, Year, age, mx) |> filter(age < 90)

Next we compute the log of the rates. Thecalculations that follow are a bit simpler if we reshape thedata. We will store them in a matrix of 55 yearsby 19 age groups.

. gen logm = log(mx). drop mx. quietly reshape wide logm, i(age) j(year)

> rates <- filter(usmx, Year <= 1987)$mx> M <- matrix(log(rates), 55, 19, byrow = TRUE)

### Fitting the Model

The first thing we need is the mean log-rate for each age group. Thisis easily computed using the `rowmeans`

function in `egen`

. `colMeans()`

. The results are remarkablyclose to Table 1 in the original paper.

. egen a = rowmean(logm1933 - logm1987). list age a ┌──────────────────┐ │ age a │ ├──────────────────┤ 1. │ 0 -3.6419408 │ 2. │ 1 -6.6958293 │ 3. │ 5 -7.5135297 │ 4. │ 10 -7.5654938 │ 5. │ 15 -6.7576327 │ ├──────────────────┤ 6. │ 20 -6.4478058 │ 7. │ 25 -6.4054185 │ 8. │ 30 -6.2271491 │ 9. │ 35 -5.9070615 │ 10. │ 40 -5.5138606 │ ├──────────────────┤ 11. │ 45 -5.0874851 │ 12. │ 50 -4.652652 │ 13. │ 55 -4.2607583 │ 14. │ 60 -3.8570917 │ 15. │ 65 -3.4747763 │ ├──────────────────┤ 16. │ 70 -3.0590744 │ 17. │ 75 -2.6391786 │ 18. │ 80 -2.2175468 │ 19. │ 85 -1.6198283 │ └──────────────────┘

> a <- colMeans(M)> a [1] -3.641941 -6.695829 -7.513530 -7.565494 -6.757633 -6.447806 -6.405419 [8] -6.227149 -5.907062 -5.513861 -5.087485 -4.652652 -4.260758 -3.857092[15] -3.474776 -3.059074 -2.639179 -2.217547 -1.619828

The next step is to subtract the average age pattern**a** from all years. To do this wecopy the data into Mata

. mata: a = st_data(1::19, "a"). mata: Y = st_data(1::19, 2..56). mata: Y = Y :- a

> for(j in 1:19) M[,j] <- M[,j] - a[j]

We are now ready to compute the Singular Value Decomposition (SVD),which writes **M** = **U D V**, where**U** and **V’** are orthogonal matrices and**D** is a diagonal matrix of singular values. Before we call the function we need to define the outputmatrices. In fact we need just the first left andright singular vectors. The first column of **U**times **D**_{1,1} times the first row of**V’** has the best rank-1 approximation to the inputmatrix.

. mata: U = d = Vt = J(0, 0, .). mata: svd(Y', U, d, Vt)

> d <- svd(M, 1, 1)

Lee and Carter normalize the first row of **V** so itsums to one and call it **b**. This vector models how thedifferent age groups react to mortality change.

. mata: b = Vt[1,]/sum(Vt[1,]). mata: b[1..5] // partial list ... 1 2 3 4 5 ┌───────────────────────────────────────────────────────────────────────┐ 1 │ .0911639354 .1121398614 .09379187 .0832670274 .0498104034 │ └───────────────────────────────────────────────────────────────────────┘

> b <- d$v/sum(d$v)> head(b, 5) [,1][1,] 0.09116394[2,] 0.11213986[3,] 0.09379187[4,] 0.08326703[5,] 0.04981040

These values are remarkably close to the *b*’s published inTable 2. Lee and Carter also take the first column of**U**, multiply by **D**_{1,1} andmultiply by the sum of the first row of **V’** (to cancelthe division) and call that **k**. This vector capturesoverall mortality change over time.

. mata: k = U[,1] * sum(Vt[1,]) * d[1]. mata: k[1..5]' // partial list... 1 2 3 4 5 ┌───────────────────────────────────────────────────────────────────────┐ 1 │ 11.39151719 11.86102892 11.35305831 11.64034529 10.85845282 │ └───────────────────────────────────────────────────────────────────────┘

> k <- d$u * sum(d$v) * d$d[1]> head(k, 5) [,1][1,] 11.39152[2,] 11.86103[3,] 11.35306[4,] 11.64035[5,] 10.85845

### Plotting Parameters and Fits

The next task is to compute the Lee-Carter fits to the mortalityrates in 1933 and 1987, reproducing part of Figure 2 in the originalpaper. We are done with Mata, so we copy**b** and **k** into Stata.

. mata: st_store(1::19, st_addvar("float","b"), b'). set obs 55 Number of observations (_N) was 19, now 55.. mata: st_store(., st_addvar("float","k"), k). gen fit1933 = a + b * k[1](36 missing values generated). gen fit1987 = a + b * k[1987-1932](36 missing values generated). gen agem = age + 2.5(36 missing values generated). replace agem = 0.5 in 1(1 real change made). replace agem = 3 in 2(1 real change made). twoway (scatter logm1933 agem) (line fit1933 agem) ///> (scatter logm1987 agem) (line fit1987 agem) ///> , title(Lee-Carter fits for 1933 and 1987) xt(age) legend(off). graph export lcfig1.png, width(500) replacefile lcfig1.png saved as PNG format

> library(ggplot2)> usmx2 <- filter(usmx, Year==1933 | Year == 1987) |> + mutate(year = factor(Year),+ fit = c(exp(a + b * k[1]), exp(a + b * k[55])))> ggplot(usmx2, aes(age, mx, color=year)) + geom_point() + + geom_line(aes(age,fit,color=year)) + scale_y_log10() ++ ggtitle("Lee-Carter Fits for 1933 and 1987")> ggsave("lcfig1r.png", width = 500/72, height = 400/72, dpi = 72)

Here’s the trajectory of *k*

. gen t = 1932 + _n . line k t, title("Lee-Carter k for 1933-1987") xt(year). graph export lcfig2.png, width(500) replacefile lcfig2.png saved as PNG format

> trend <- data.frame(year = 1933:1987, k = k)> ggplot(trend, aes(year, k)) + geom_line() ++ ggtitle("Lee-Carter k for 1933-1987")> ggsave("lcfig2r.png", width = 500/72, height = 400/72, dpi = 72)

### A Random Walk

We now simulate the stochastic process driving **k**, arandom walk with drift *-0.365* and innovations with standarddeviation *0.652*. The idea is to generate normally distributederrors, add the drift, and them add to each value the previous one,which is easiest done as a running sum. We do this 50 times. For clarity I use a temporary variable called`e`

for the errors

. set obs 61Number of observations (_N) was 55, now 61.. gen n = _n. gen e = .(61 missing values generated). forvalues r = 1/50 { 2. quietly replace e = rnormal() * 0.652 3. quietly gen k`r' = -11.05 - 0.365 + e in 1 4. quietly replace k`r' = k`r'[_n-1] - 0.365 + e in 2/L 5. }. twoway (line k1-k50 n), legend(off) xt(year) title(50 Random Walks). graph export lcfig3.png, width(500) replacefile lcfig3.png saved as PNG format

> S <- matrix(0,61,50)> for(j in 1:50) S[,j] <- -11.05 + c*msum(-0.365 + rnorm(61, 0, 0.652))> df <- data.frame(S); df$year <- 1990:2050> ggplot(gather(df, sim, k, -year), aes(year, k, color = sim)) ++ geom_line() + guides(color = "none") + + ggtitle("Fifty Random Walks")> ggsave("lcfig3r.png", width = 500/72, height = 400/72, dpi = 72)

This being a simulation your results will differ from mine. Youshould note that there is considerable uncertainty about the level ofmortality 50 years into the future.

### Age-Specific Forecasts.

Let us forecast age-specific mortality in 2050 starting from 1989. Tothis end we will use the official values of **a** and**b**, which I saved in a text file called`leecarter.dat`

(also a Stata file`leecarter.dta`

) in the datasets section. These values extendthe two sets of parameters to age 105+, but are otherwise practicallyidentical to the estimates we just obtained.

Next we compute *k* starting from -11.05 in 1989. I call the resulting scalar `km`

to avoidconflict with the variable `k`

. We also compute thestandard deviation and a 95% confidence interval. Atthis point we restore the original data to get all ages

. use us1x5, clear. keep if age < 110(88 observations deleted). gen logm = log(mx). keep age year logm. merge m:1 age using https://grodri.github.io/datasets/leecarter Result Number of obs ───────────────────────────────────────── Not matched 0 Matched 2,024 (_merge==3) ─────────────────────────────────────────. gen agem = age + 2.5. quietly replace agem = 0.5 if age == 0. quietly replace agem = 3 if age == 1. scalar km = -11.05 - 0.365*(2050 - 1989). scalar sd = sqrt(2050 - 1989)*0.652. di km - 1.96*sd, km + 1.96*sd-43.295874 -23.334126. gen p2050 = lc_a + lc_b * km. gen lb2050 = lc_a + lc_b * (km - 1.96*sd). gen ub2050 = lc_a + lc_b * (km + 1.96*sd). twoway (rarea ub2050 lb2050 agem, color(ltblue)) (line p2050 agem) ///> , legend(off) title(Forecast for 2050 with 95% CI) xt(age). graph export lcfig4.png, width(500) replacefile lcfig4.png saved as PNG format

> lc <- read.table("leecarter.dat", header = TRUE)> k2050 <- 33.3> z <- qnorm(0.975)> se <- sqrt(2050 - 1989) * 0.652> k2050 + c(-1, 1) * z * se[1] 23.31931 43.28069> forecast <- data.frame(age = lc$age,+ fit = exp(lc$a + lc$b * k2050),+ low = exp(lc$a + lc$b * (k2050 - z * se)),+ hi = exp(lc$a + lc$b * (k2050 + z * se)))> ggplot(forecast, aes(age, fit)) + scale_y_log10() ++ geom_ribbon(aes(ymin = low, ymax = hi), fill="#d0d0d0") ++ geom_line() + ggtitle("Figure 4: Forecast for 2050 with 95% CI")> ggsave("lcfig4r.png", width = 500/72, height = 400/72, dpi = 72)

So we are 95% confident that *k* will be between -43.3 and-23.3 in 2050. Finally we combine *k* and the **a**and **b** schedules to forecast and plot age-specificmortality (in the log scale).

### Corrected Jump-off

The final issue concerns the choice of jump-off. Below we will make a“forecast” for 2013, the latest year for which we have data. Theprediction is not too good for young adults because of a slight lack offit in 1989 which propagates in the forecast. The solution is to“jump-off” from the actual 1989 rates.

. keep if year == 1989 | year == 2013(1,978 observations deleted). drop _merge. reshape wide logm, i(age lc_a lc_b) j(year) (j = 1989 2013)Data Long -> Wide─────────────────────────────────────────────────────────────────────────────Number of observations 46 -> 23 Number of variables 9 -> 9 j variable (2 values) year -> (dropped)xij variables: logm -> logm1989 logm2013─────────────────────────────────────────────────────────────────────────────. scalar drift = -0.365 * (2013 - 1989). scalar se = 0.625 * sqrt(2013 - 1989). scalar z = invnormal(0.975). gen p2013 = lc_a + lc_b * (-11.05 + drift). gen j2013 = logm1989 + lc_b * ( drift). gen ub2013 = logm1989 + lc_b * ( drift - z * se). gen lb2013 = logm1989 + lc_b * ( drift + z * se). twoway (rarea ub2013 lb2013 agem, color(ltblue)) ///> (scatter logm2013 agem if age < 110) ///> (line p2013 agem, lpat(dash)) (line j2013 agem if age < 110) ///> , title(Observed and Forecast for 2013) xt(age) ///> legend(order(3 "standard" 4 "jumpoff 1989") ring(0) pos(5) cols(1)) . graph export lcfig5.png, width(500) replacefile lcfig5.png saved as PNG format

> later <- select(us, age, Year, mx) |> filter(Year == 1989 | Year == 2013)> a89 <- log(filter(later, Year == 1989, age < 110)$mx)> k89 <- -(2013 - 1989) * 0.365> se <- sqrt(2013 - 1989) * 0.652> y2013 <- filter(later, Year == 2013, age < 110) |>+ mutate(standard = exp(lc$a - lc$b * 19.81),+ jumpof = exp(a89 + lc$b * k89),+ low = exp(a89 + lc$b * (k89 - z * se)),+ hi = exp(a89 + lc$b * (k89 + z * se)))> ggplot(y2013, aes(age, mx)) + scale_y_log10() ++ geom_ribbon(aes(ymin = low, ymax = hi), fill = "#d0d0d0") ++ geom_point() + geom_line(aes(age, standard)) + geom_line(aes(age, jumpof)) ++ ggtitle("Observed and Forecast for 2013")> ggsave("lcfig5r.png", width = 500/72, height = 400/72, dpi = 72)

The performance of the model with the correct jump-off for a 24-yearforecast is nothing short of remarkable.

**Update**. Now that we have more data we can test themodel further. Turns out the model doesn’t do so well for later years,because there was an unexpected rise in adult mortality during theworking ages starting around 2010, not to mention the pandemic and thesubsequent increase in mortality in 2020. You may wish to explore theseissues using the available data.

### Reference

HMD. Human Mortality Database. Max Planck Institute for DemographicResearch (Germany), University of California, Berkeley (USA), and FrenchInstitute for Demographic Studies (France). Available at https://www.mortality.org.