Title: | Efficient Algorithm for High-Dimensional Frailty Model |
---|---|
Description: | The penalized and non-penalized Minorize-Maximization (MM) method for frailty models to fit the clustered data, multi-event data and recurrent data. Least absolute shrinkage and selection operator (LASSO), minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalized functions are implemented. All the methods are computationally efficient. These general methods are proposed based on the following papers, Huang, Xu and Zhou (2022) <doi:10.3390/math10040538>, Huang, Xu and Zhou (2023) <doi:10.1177/09622802221133554>. |
Authors: | Xifen Huang [aut], Yunpeng Zhou [aut, cre], Jinfeng Xu [ctb] |
Maintainer: | Yunpeng Zhou <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.1 |
Built: | 2024-10-13 03:47:36 UTC |
Source: | https://github.com/heilokchow/frailtymmpen |
retrieve the coefficients under given tuning parameter
## S3 method for class 'fpen' coef(object, ...)
## S3 method for class 'fpen' coef(object, ...)
object |
Object with class "fpen", generated from |
... |
Ignored |
Without given a specific tune value, the coefficients with minimum BIC is returned. If tune=a
,
the coefficient is computed using linear interpolation of the result from the coefficients estimated from the run of regularization path.
Thus, a
should between the minimum and maximum value of the tuning parameter sequences used for the model fitting.
A vector of estimated parameters.
formula
.
#' @return No return value, called to construct formula
.
#' @export
cluster <- function(x)
x
event functionSpecify event id for multi-event data in the input formula
.
event(x)
event(x)
x |
name from original dataframe which specifies the event id. |
No return value, called to construct formula
.
This formula is used to fit the non-penalized regression. 3 types of the models can be fitted, shared frailty model where
hazard rate of object in
cluster is
The multi-event frailty model with different baseline hazard of different event and the hazard rate of event for individual
is
The recurrent event model where the event of individual
has observed feature
,
For the clustered type of data, we further assume that cluster has
with
number
of objects where they share the common frailty parameter
. For simplicity, we let
be the collection of all parameters and baseline hazard function. Then, the marginal likelihood is as follows,
Given the objective functions above, we take the clustered data as an example to illustrate the application of MM algorithm in optimizing the observed likelihood function, the observed log-likelihood function is,
where,
In order to formulate the iterative algorithm to optimize the observed log likelihood, we further define density function
based on the estimates of the parameters in
iteration
Then, we construct the surrogate function to minimize the mariginal log-likelihood using the Jensen's inequality,
which successfully separated into
and
where,
and let ,
And then we estimate by,
Then, we have,
Further more, we apply hyperplane inequality to construct surrogate function for where we can update the its estimates coordinate wise,
By applying Jensen's inequality,
Finally,
frailtyMM( formula, data, frailty = "gamma", power = NULL, tol = 1e-05, maxit = 200, ... )
frailtyMM( formula, data, frailty = "gamma", power = NULL, tol = 1e-05, maxit = 200, ... )
formula |
Formula where the left hand side is an object of the type |
data |
The |
frailty |
The frailty used for model fitting. The default is "lognormal", other choices are "invgauss", "gamma" and "pvf". (Note that the computation time for PVF family will be slow due to the non-explicit expression of likelihood function) |
power |
The power used if PVF frailty is applied. |
tol |
The tolerance level for convergence. |
maxit |
Maximum iterations for MM algorithm. |
... |
additional arguments pass to the function. |
To run the shared frailty model, Surv(tstop, status)
formula should be applied along with +cluster()
to specify the
corresponding clusters, if you want to run the simple frailty model without shared frailty, you do not need to use +cluster()
and the
formula only contains the name of the covariates. To run the multi-event model,
Surv(tstop, status)
formula should be applied along with +event()
to specify the corresponding events. If multi-event data
is fitted, please use 1,2...,K to denote the event id from the input data. To run the recurrent event model,
Surv(tstart, tstop, status)
formula should be applied along with +cluster()
where the cluster here denotes the individual id and
each individual may have many observed events at different time points.
The default frailty will be log-normal frailty, in order to fit other frailty models, simply set parameter frailty
as "InvGauss", "Gamma" or "PVF",
the parameter power
is only used when frailty
=PVF and since the likelihood of PVF (tweedie) distribution is approximated using
Tweedie
function from package mgcv, 1<power
<2.
An object of class fmm
that contains the following fields:
coef |
coefficient estimated from a specific model. |
est.tht |
frailty parameter estimated from a specific model. |
lambda |
frailty for each observation estimated from a specific model. |
likelihood |
The observed log-likelihood given estimated parameters. |
input |
The input data re-ordered by cluster id. |
frailty |
frailty used for model fitting. |
power |
power used for model fitting is PVF frailty is applied. |
iter |
total number of iterations. |
convergence |
convergence threshold. |
formula |
formula applied as input. |
coefname |
name of each coefficient from input. |
id |
id for individuals or clusters, 1,2...,a. Note that, since the original id may not be the sequence starting from 1, this output
id may not be identical to the original id. Also, the order of id is corresponding to the returned |
N |
total number of observations. |
a |
total number of individuals or clusters. |
datatype |
model used for fitting. |
Huang, X., Xu, J. and Zhou, Y. (2022). Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics, 10(4), 538.
Huang, X., Xu, J. and Zhou, Y. (2023). Efficient algorithms for survival data with multiple outcomes using the frailty model. Statistical Methods in Medical Research, 32(1), 118-132.
# Kidney data fitted by Clustered Inverse Gaussian Frailty Model InvG_real_cl = frailtyMM(Surv(time, status) ~ age + sex + cluster(id), kidney, frailty = "invgauss") InvG_real_cl # Cgd data fitted by Recurrent Log-Normal Frailty Model logN_real_re = frailtyMM(Surv(tstart, tstop, status) ~ sex + treat + cluster(id), cgd, frailty = "gamma") logN_real_re # Simulated data example data(simdataCL) # Parameter estimation under different model structure and frailties # Clustered Gamma Frailty Model gam_cl = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma") # Clustered Log-Normal Frailty Model logn_cl = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "lognormal") # Clustered Inverse Gaussian Frailty Model invg_cl = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "invgauss") data(simdataME) # Multi-event Gamma Frailty Model gam_me = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma") # Multi-event Log-Normal Frailty Model logn_me = frailtyMM(Surv(time, status) ~ . + event(id), simdataME, frailty = "lognormal") # Multi-event Inverse Gaussian Frailty Model invg_me = frailtyMM(Surv(time, status) ~ . + event(id), simdataME, frailty = "invgauss") data(simdataRE) # Recurrent event Gamma Frailty Model gam_re = frailtyMM(Surv(start, end, status) ~ . + cluster(id), simdataRE, frailty = "gamma") # Recurrent event Log-Normal Frailty Model logn_re = frailtyMM(Surv(start, end, status) ~ . + cluster(id), simdataRE, frailty = "lognormal") # Recurrent event Inverse Gaussian Frailty Model invg_re = frailtyMM(Surv(start, end, status) ~ . + cluster(id), simdataRE, frailty = "invgauss") # Obtain the summary statistics under fitted model coef(gam_cl) summary(gam_cl)
# Kidney data fitted by Clustered Inverse Gaussian Frailty Model InvG_real_cl = frailtyMM(Surv(time, status) ~ age + sex + cluster(id), kidney, frailty = "invgauss") InvG_real_cl # Cgd data fitted by Recurrent Log-Normal Frailty Model logN_real_re = frailtyMM(Surv(tstart, tstop, status) ~ sex + treat + cluster(id), cgd, frailty = "gamma") logN_real_re # Simulated data example data(simdataCL) # Parameter estimation under different model structure and frailties # Clustered Gamma Frailty Model gam_cl = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma") # Clustered Log-Normal Frailty Model logn_cl = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "lognormal") # Clustered Inverse Gaussian Frailty Model invg_cl = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "invgauss") data(simdataME) # Multi-event Gamma Frailty Model gam_me = frailtyMM(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma") # Multi-event Log-Normal Frailty Model logn_me = frailtyMM(Surv(time, status) ~ . + event(id), simdataME, frailty = "lognormal") # Multi-event Inverse Gaussian Frailty Model invg_me = frailtyMM(Surv(time, status) ~ . + event(id), simdataME, frailty = "invgauss") data(simdataRE) # Recurrent event Gamma Frailty Model gam_re = frailtyMM(Surv(start, end, status) ~ . + cluster(id), simdataRE, frailty = "gamma") # Recurrent event Log-Normal Frailty Model logn_re = frailtyMM(Surv(start, end, status) ~ . + cluster(id), simdataRE, frailty = "lognormal") # Recurrent event Inverse Gaussian Frailty Model invg_re = frailtyMM(Surv(start, end, status) ~ . + cluster(id), simdataRE, frailty = "invgauss") # Obtain the summary statistics under fitted model coef(gam_cl) summary(gam_cl)
This formula is used to fit the penalized regression. 3 types of the models can be fitted similar to the function
frailtyMM
. In addition, variable selection can be done by three types of penalty, LASSO, MCP and SCAD with the following
objective function where is the tuning parameter and
is the dimension of
,
The BIC is computed using the following equation,
where and
is the degree of freedom.
Surrogate function is also derived for penalty part for efficient estimation of penalized regression, similar to the notation used
in frailtyMM
, we let be the collection of all parameters and baseline hazard function. Given that,
by local quadratic approximation,
And thus, the surrogate function given iteration result is as follows,
frailtyMMpen( formula, data, frailty = "gamma", power = NULL, penalty = "LASSO", gam = NULL, tune = NULL, tol = 1e-05, maxit = 200, ... )
frailtyMMpen( formula, data, frailty = "gamma", power = NULL, penalty = "LASSO", gam = NULL, tune = NULL, tol = 1e-05, maxit = 200, ... )
formula |
Formula where the left hand side is an object of the type |
data |
The |
frailty |
The frailty used for model fitting. The default is "lognormal", other choices are "invgauss", "gamma" and "pvf". (Note that the computation time for PVF family will be slow due to the non-explicit expression of likelihood function) |
power |
The power used if PVF frailty is applied. |
penalty |
The penalty used for regularization, the default is "LASSO", other choices are "MCP" and "SCAD". |
gam |
The tuning parameter for MCP and SCAD which controls the concavity of the penalty. For MCP,
and for "SCAD",
The default value of |
tune |
The sequence of tuning parameters provided by user. If not provided, the default grid will be applied. |
tol |
The tolerance level for convergence. |
maxit |
Maximum iterations for MM algorithm. |
... |
additional arguments pass to the function. |
Without a given tune
, the default sequence of tuning parameters are used to provide the regularization path.
The formula is same as the input for function frailtyMM
.
An object of class fmm
that contains the following fields:
coef |
matrix of coefficient estimated from a specific model where each column correponds to an input tuning parameter. |
est.tht |
vector of frailty parameters estimated from a specific model with respect to each tuning parameter. |
lambda |
list of frailty for each observation estimated from a specific model with respect to each tuning parameter. |
likelihood |
vector of the observed log-likelihood given estimated parameters with respect to each tuning parameter. |
BIC |
vector of the BIC given estimated parameters with respect to each tuning parameter. |
tune |
vector of tuning parameters used for penalized regression. |
tune.min |
tuning parameter where minimal of BIC is obtained. |
convergence |
convergence threshold. |
input |
The input data re-ordered by cluster id. |
y |
input stopping time. |
X |
input covariate matrix. |
d |
input censoring indicator. |
formula |
formula applied as input. |
coefname |
name of each coefficient from input. |
id |
id for individuals or clusters, 1,2...,a. Note that, since the original id may not be the sequence starting from 1, this output
id may not be identical to the original id. Also, the order of id is corresponding to the returned |
N |
total number of observations. |
a |
total number of individuals or clusters. |
datatype |
model used for fitting. |
Huang, X., Xu, J. and Zhou, Y. (2022). Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. Mathematics, 10(4), 538.
Huang, X., Xu, J. and Zhou, Y. (2023). Efficient algorithms for survival data with multiple outcomes using the frailty model. Statistical Methods in Medical Research, 32(1), 118-132.
data(simdataCL) # Penalized regression under clustered frailty model # Clustered Gamma Frailty Model # Using default tuning parameter sequence gam_cl1 = frailtyMMpen(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma") # Using given tuning parameter sequence gam_cl2 = frailtyMMpen(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma", tune = 0.1) # Obtain the coefficient where minimum BIC is obtained coef(gam_cl1) # Obtain the coefficient with tune = 0.2. coef(gam_cl1, tune = 0.2) # Plot the regularization path plot(gam_cl1) # Get the degree of freedom and BIC for the sequence of tuning parameters provided print(gam_cl1)
data(simdataCL) # Penalized regression under clustered frailty model # Clustered Gamma Frailty Model # Using default tuning parameter sequence gam_cl1 = frailtyMMpen(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma") # Using given tuning parameter sequence gam_cl2 = frailtyMMpen(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "gamma", tune = 0.1) # Obtain the coefficient where minimum BIC is obtained coef(gam_cl1) # Obtain the coefficient with tune = 0.2. coef(gam_cl1, tune = 0.2) # Plot the regularization path plot(gam_cl1) # Get the degree of freedom and BIC for the sequence of tuning parameters provided print(gam_cl1)
This data include 50 clusters with 4 objects, a total of 200 events are recorded. 500 covariates can be used for model fitting. The data is generated from a gamma frailty model with coefficients (5,5,5,5,...,0) and frailty variance 1.
data(hdCLdata)
data(hdCLdata)
An object of class data.frame
with 200 rows and 503 columns.
data(hdCLdata)
data(hdCLdata)
Both the cumulative hazard and the survival curves can be plotted.
##S3 method for class "fmm"
##S3 method for class "fmm"
newdata |
The new data for prediction of hazard |
surv |
Plot survival curve instead of cumulative hazard, the default is |
... |
Further arguments pass to or from other methods |
object |
Object with class "fmm" |
If parameter newdata
is given, the plot is based on the predicted hazard while if it is not given,
the plot is based on the baseline hazard. To construct the new data, please refer to the detailed description from
function predict.fmm
and the following example.
gam_re = frailtyMM(Surv(tstart, tstop, status) ~ sex + treat + cluster(id), cgd, frailty = "gamma") # Plot the survival curve based on baseline hazard plot(gam_re, surv = TRUE) # Construct new data and plot the cumulative hazard based on new data newre = c(1, 1, 2) names(newre) = c(gam_re$coefname, "id") plot(gam_re, newdata = newre)
gam_re = frailtyMM(Surv(tstart, tstop, status) ~ sex + treat + cluster(id), cgd, frailty = "gamma") # Plot the survival curve based on baseline hazard plot(gam_re, surv = TRUE) # Construct new data and plot the cumulative hazard based on new data newre = c(1, 1, 2) names(newre) = c(gam_re$coefname, "id") plot(gam_re, newdata = newre)
Plot the whole regularization path run by frailtyMMpen
##S3 method for class "fpen"
##S3 method for class "fpen"
x |
Object with class "fpen" |
... |
Further arguments pass to or from other methods |
This function is used to estimate the baseline hazard or to predict the hazard rate of a specific individual given result from model fitting.
##S3 method for class "fmm"
##S3 method for class "fmm"
object |
Object with class "fmm" |
newdata |
The new data for prediction of hazard, categorical data has to be transformed to 0 and 1 |
surv |
Plot survival curve instead of cumulative hazard, the default is |
... |
Further arguments pass to or from other methods |
If parameter newdata
is given, the predicted hazard is calculated based on the given data.
If parameter newdata
is not given, the estimation of baseline hazard will be returned.
The confidence band is calculated based on the delta method. Please insure that the input of new data
should be of the same coefficient name as object$coefname
. Note that if original data contains categorical data, you could check
object$coefname
to input the corresponding 0 or 1 and name of coefficient for the newdata
.
For example, if the coefficient name is "sexfemale", then 1 denotes female while 0 denotes male. You may refer to the
example below to construct the new data.
output |
A dataframe that the first column is the evaluated time point, the second column is the estimated cumulative hazard or survival curve, the third column is the standard error of the estimation result and the fourth and fifth column are the lower bound and upper bound based on 95 percent confidence interval. |
gam_re = frailtyMM(Surv(tstart, tstop, status) ~ sex + treat + cluster(id), cgd, frailty = "Gamma") # Calculate the survival curve based on baseline hazard predict(gam_re, surv = TRUE) # Construct new data and calculate the cumulative hazard based on new data newre = c(1, 1, 2) names(newre) = c(gam_re$coefname, "id") predict(gam_re, newdata = newre)
gam_re = frailtyMM(Surv(tstart, tstop, status) ~ sex + treat + cluster(id), cgd, frailty = "Gamma") # Calculate the survival curve based on baseline hazard predict(gam_re, surv = TRUE) # Construct new data and calculate the cumulative hazard based on new data newre = c(1, 1, 2) names(newre) = c(gam_re$coefname, "id") predict(gam_re, newdata = newre)
This function is used to estimate the baseline hazard or to predict the hazard rate of a specific individual given result from model fitting.
##S3 method for class "fpen"
##S3 method for class "fpen"
object |
Object with class "fpen" |
tune |
The tuning parameter for estimating coefficients |
coef |
Instead of providing tuning parameter, you can directly provide the coefficients for prediction |
newdata |
The new data for prediction of hazard, categorical data has to be transformed to 0 and 1 |
surv |
Plot survival curve instead of cumulative hazard, the default is |
... |
Further arguments pass to or from other methods |
If parameter newdata
is given, the predicted hazard is calculated based on the given data.
If parameter newdata
is not given, the estimation of baseline hazard will be returned.
Since the covariance of estimated parameters for penalized regression cannot be obtained from MLE theorem, we
only provide the estimation without confidence band. For the formulation of new data, you may refer to
function predict.fmm
for detailed description.
output |
A dataframe that the first column is the evaluated time point and the second column is the estimated cumulative hazard or survival curve. |
data(simdataCL) gam_cl = frailtyMMpen(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "Gamma") # Calculate the survival curve based on baseline hazard predict(gam_cl, surv = TRUE) # Construct new data and calculate the cumulative hazard based on new data newcl = c(gam_cl$X[1,], 2) names(newcl) = c(gam_cl$coefname, "id") predict(gam_cl, newdata = newcl)
data(simdataCL) gam_cl = frailtyMMpen(Surv(time, status) ~ . + cluster(id), simdataCL, frailty = "Gamma") # Calculate the survival curve based on baseline hazard predict(gam_cl, surv = TRUE) # Construct new data and calculate the cumulative hazard based on new data newcl = c(gam_cl$X[1,], 2) names(newcl) = c(gam_cl$coefname, "id") predict(gam_cl, newdata = newcl)
Print the summary of a non-penalized regression fitted by any model with function frailtyMM
## S3 method for class 'fmm' print(x, ...)
## S3 method for class 'fmm' print(x, ...)
x |
Object with class "fmm" fitted by function |
... |
Ignored |
No return value, called to print the summary for non-penalized regression.
Print the summary of a non-penalized regression fitted by any model with function frailtyMMpen
.
The first column is the tuning parameter sequence, the second column is the degree of freedom and the third column is the BIC.
## S3 method for class 'fpen' print(x, ...)
## S3 method for class 'fpen' print(x, ...)
x |
Object with class "fpen" fitted by function |
... |
Ignored |
No return value, called to print the summary for penalized regression.
This data include 50 clusters with 10 objects, a total of 100 events are recorded. 30 temporal covariates can be used for model fitting. The data is generated from a gamma frailty model with coefficients (1,2,3,4,0,...,0) and frailty variance 1.
data(simdataCL)
data(simdataCL)
An object of class data.frame
with 500 rows and 33 columns.
data(simdataCL)
data(simdataCL)
This data include 50 individuals with 2 events, a total of 100 events are recorded. 30 temporal covariates can be used for model fitting. The data is generated from a gamma frailty model with coefficients (1,2,3,4,0,...,0) and frailty variance 1.
data(simdataME)
data(simdataME)
An object of class data.frame
with 100 rows and 33 columns.
data(simdataME)
data(simdataME)
This data include 50 individuals with recurrent observation of events, a total of 706 events are recorded. 30 temporal covariates can be used for model fitting. The data is generated from a gamma frailty model with coefficients (1,2,3,4,0,...,0) and frailty variance 1.
data(simdataRE)
data(simdataRE)
An object of class data.frame
with 706 rows and 34 columns.
data(simdataRE)
data(simdataRE)
Provide the summary for the model fitting
##S3 method for class "fmm"
##S3 method for class "fmm"
object |
Object with class "fmm", generated from |
... |
Ignored |
the summary for the model of frailtyMM. The standard error and p-value of estimated parameters are based on Fisher Information matrix.