Title: | Bradley-Terry Model with Exponential Time Decayed Log-Likelihood and Adaptive Lasso |
---|---|
Description: | We utilize the Bradley-Terry Model to estimate the abilities of teams using paired comparison data. For dynamic approximation of current rankings, we employ the Exponential Decayed Log-likelihood function, and we also apply the Lasso penalty for variance reduction and grouping. The main algorithm applies the Augmented Lagrangian Method described by Masarotto and Varin (2012) <doi:10.1214/12-AOAS581>. |
Authors: | Yunpeng Zhou [aut, cre], Jinfeng Xu [aut] |
Maintainer: | Yunpeng Zhou <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2025-02-01 05:13:32 UTC |
Source: | https://github.com/heilokchow/btdecaylasso |
Bootstrapping is done assuming that Maximum Likelihood's estimation reflects the true abilities. Same level of Lasso penalty "lambda" should be applied in different simulation models for Lasso induced estimation.
boot.BTdecayLasso( dataframe, ability, lambda, boot = 100, weight = NULL, decay.rate = 0, fixed = 1, thersh = 1e-05, max = 100, iter = 100 )
boot.BTdecayLasso( dataframe, ability, lambda, boot = 100, weight = NULL, decay.rate = 0, fixed = 1, thersh = 1e-05, max = 100, iter = 100 )
dataframe |
Generated using |
ability |
A column vector of teams ability, the last row is the home parameter.
The row number is consistent with the team's index shown in dataframe. It can be generated using |
lambda |
The amount of Lasso penalty induced, only a single scalar is accepted in bootstrapping. |
boot |
Amount of simulations. |
weight |
Weight for Lasso penalty on different abilities. |
decay.rate |
The exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more importance to most recent matches and the estimated parameters reflect more on recent behaviour. |
fixed |
A teams index whose ability will be fixed as 0. The worstTeam's index
can be generated using |
thersh |
Threshold for convergence |
max |
Maximum weight for |
iter |
Number of iterations used in L-BFGS-B algorithm. |
100 times of simulation will be done by default, user can adjust the numbers of simulation by input of boot. However, bootstrapping process is time consuming and usually 1000 time of simulations is enough to provide a stable result.
More detailed description of "lambda", "penalty" and "weight" are documented in BTdecayLasso
.
summary() function follows S3 method can be applied to view the outputs.
A list with class "boot" contain Lasso and Hybrid Lasso's bootstrapping's mean and standard deviation.
Lasso |
Lasso bootstrapping's result. A three column matrix where first column is the original estimation, the second column is bootstrapping mean and the last column is the bootstrapping standard deviation |
HYBRID.Lasso |
HYBRID Lasso bootstrapping's result. A three column matrix where the first column is the original estimation, the second column is bootstrapping mean and the last column is the bootstrapping standard deviation |
Masarotto, G. and Varin, C.(2012) The Ranking Lasso and its Application to Sport Tournaments. *The Annals of Applied Statistics* **6** 1949–1970.
Zou, H. (2006) The adaptive lasso and its oracle properties. *J.Amer.Statist.Assoc* **101** 1418–1429.
BTdataframe
for dataframe initialization, BTdecayLasso
for detailed description
Dataframe initialization
BTdataframe(dataframe, home = TRUE)
BTdataframe(dataframe, home = TRUE)
dataframe |
Raw dataframe input, an example data "NFL2010" is attached in package for reference The raw data is a dataframe with 5 columns. First column is home teams. Second column is away teams. Third column is the number of wins of home teams (if home team defeats away team, record 1 here, 0 otherwise). Fourth column is the number of wins of away teams (if home team defeats away team, record 0 here, 1 otherwise). Fifth column is a scalar of time when the match is played until now (Time lag). Any time scale can be used here. "NFL2010" applies the unit of day. |
home |
Whether home effect will be considered, the default is TRUE. |
Initial the raw dataframe and return an un-estimated ability vector and the worst team who loses most.
Note that even if the tournament does not have any home team or away team, you can still provide the match results according to the description above regardless of who is at home and who is away. By selecting the home = FALSE, We duplicate the dataset, switch the home, away teams and also the home, away match results. Then this dataset will be attached to the original dataset and all home and away win's number will be divided by 2. MLE estimation of home effect is proved to be an exact 0.
The elimination of home effect by duplicating the original dataset will be less efficient than eliminating the home parameter directly in iterations. Since most games such as football, basketball have home effect and this method provides an idea of handling the case where some games have home effect and some games are played on neutral place, this method is applied here.
dataframe |
dataframe for Bradley-Terry run |
ability |
Initial ability vector for iterations |
worstTeam |
The worst team whose ability can be set as 0 during any model's run |
Exponential decay rate is applied to the likelihood function to achieve a better track of current abilities. When "decay.rate" is setting as 0,
this is a standard Bradley-Terry Model whose estimated parameters are equivalent to package "BradleyTerry2".
Further detailed description is attached in BTdecayLasso
.
BTdecay(dataframe, ability, decay.rate = 0, fixed = 1, iter = 100)
BTdecay(dataframe, ability, decay.rate = 0, fixed = 1, iter = 100)
dataframe |
Generated using |
ability |
A column vector of teams ability, the last row is the home parameter.
The row number is consistent with the team's index shown in dataframe. It can be generated using |
decay.rate |
The exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more importance to most recent matches and the estimated parameters reflect more on recent behaviour. |
fixed |
A teams index whose ability will be fixed as 0. The worstTeam's index
can be generated using |
iter |
Number of iterations used in L-BFGS-B algorithm. |
The standard Bradley-Terry Model defines the winning probability of i against j,
is the home parameter and
is the team i's ability score.
takes 1 if team i is at home, -1 otherwise.
Given, a complete tournament's result. The objective likelihood function with an exponential decay rate is,
where n is the number of matches, is the exponential decay rate and
takes 0 if i is defeated by j, 1 otherwise.
is
the time lag (time until now).
This likelihood function is optimized using L-BFGS-B method with package optimr and summary() function with S3 method can be applied to view the outputs.
List with class "BT" contains estimated abilities and convergent code, 0 stands for convergence reaches, 1 stands for convergence not reaches. If 1 is returned, we suggest that decay rate should be set lower. Bradley-Terry model fails to model the situation when a team wins or loses in all matches. If a high decay rate is considered, a team who only loses or wins 1 matches long time ago will also causes the same problem.
ability |
Estimated ability scores |
convergence |
0 stands for convergent, 1 stands for not convergent |
decay.rate |
Decay rate of this model |
##Initializing Dataframe x <- BTdataframe(NFL2010) ##Standard Bradley-Terry Model optimization y <- BTdecay(x$dataframe, x$ability, decay.rate = 0, fixed = x$worstTeam) summary(y) ##Dynamic approximation of current ability scores using exponential decayed likelihood. ##If we take decay.rate = 0.005 ##Match happens one month before will weight exp(-0.15)=0.86 on log-likelihood function z <- BTdecay(x$dataframe, x$ability, decay.rate = 0.005, fixed = x$worstTeam) summary(z)
##Initializing Dataframe x <- BTdataframe(NFL2010) ##Standard Bradley-Terry Model optimization y <- BTdecay(x$dataframe, x$ability, decay.rate = 0, fixed = x$worstTeam) summary(y) ##Dynamic approximation of current ability scores using exponential decayed likelihood. ##If we take decay.rate = 0.005 ##Match happens one month before will weight exp(-0.15)=0.86 on log-likelihood function z <- BTdecay(x$dataframe, x$ability, decay.rate = 0.005, fixed = x$worstTeam) summary(z)
Bradley-Terry model is applied for paired comparison data. Teams' ability score is estimated by maximizing log-likelihood function.
To achieve a better track of current abilities, we apply an exponential decay rate to weight the log-likelihood function.
The most current matches will weight more than previous matches. Parameter "decay.rate" in most functions of this package is used
to set the amount of exponential decay rate. decay.rate should be non-negative and the appropriate range of it depends on time scale in original dataframe.
(see BTdataframe
and parameter "dataframe"'s definition of fifth column) For example,
a unit of week with a "decay.rate" 0.007 is equivalent to the unit of day with "decay.rate" 0.001. Usually, for sports matches,
if we take the unit of day, it's ranging from 0 to 0.01. The higher choice of "decay.rate", the better track of current teams' ability
with a side effect of higher variance.
If "decay.rate" is too large, for example "0.1" with a unit of day, = 0.50. Only half weight will be add to the likelihood for matches played
one week ago and
= 0.05 suggests that previous matches took place one month ago will have little effect. Therefore, Only a few matches are
accounted for ability's estimation. It will lead to a very high variance and uncertainty. Since standard Bradley-Terry model
can not handle the case where there is a team who wins or loses all matches, such estimation may not provide convergent results.
Thus, if our estimation provides divergent result, an error will be returned and we suggest user to chose a smaller "decay.rate"
or adding more match results into the same modeling period.
By default, the Adaptive Lasso is implemented for variance reduction and team's grouping. Adaptive Lasso is proved to have good grouping property.
Apart from adaptive lasso, user can define own weight for different
Lasso constraint where
is team i's ability.
Also by default, the whole Lasso path will be run. Similar to package "glmnet", user can provide their own choice of Lasso penalty "lambda" and determine whether the
whole Lasso path will be run (since such run is time-consuming). However, we suggest that if user is not familiar with the actual relationship among
lambda, the amount of penalty, the amount of shrinkage and grouping effect, a whole Lasso path should be run and selection of an
appropriate lambda is done by AIC or BIC criteria using BTdecayLassoC
(since this model is time related, cross-validation method cannot be applied). Also, users can
use BTdecayLassoF
to run with a specific Lasso penalty ranging from 0 to 1 (1 penalty means all estimators will shrink to 0).
Two sets of estimated abilities will be given, the biased Lasso estimation and the HYBRID Lasso's estimation. HYBRID Lasso estimation solves the restricted Maximum Likelihood optimization based on the group determined by Lasso's estimation (Different team's ability will converges to the same value if Lasso penalty is added and these teams' ability is setting to be equal as a restriction).
In addition, summary() using S3 method can be applied to view the outputs.
BTdecayLasso( dataframe, ability, lambda = NULL, weight = NULL, path = TRUE, decay.rate = 0, fixed = 1, thersh = 1e-05, max = 100, iter = 100 )
BTdecayLasso( dataframe, ability, lambda = NULL, weight = NULL, path = TRUE, decay.rate = 0, fixed = 1, thersh = 1e-05, max = 100, iter = 100 )
dataframe |
Generated using |
ability |
A column vector of teams ability, the last row is the home parameter.
The row number is consistent with the team's index shown in dataframe. It can be generated using |
lambda |
The amount of Lasso penalty induced. The input should be a positive scalar or a sequence. |
weight |
Weight for Lasso penalty on different abilities. |
path |
whether the whole Lasso path will be run (plot.BTdecayLasso is enabled only if path = TRUE) |
decay.rate |
A non-negative exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more importance to most recent matches and the estimated parameters reflect more on recent behaviour. |
fixed |
A teams index whose ability will be fixed as 0. The worstTeam's index
can be generated using |
thersh |
Threshold for convergence used for Augmented Lagrangian Method. |
max |
Maximum weight for |
iter |
Number of iterations used in L-BFGS-B algorithm. |
According to BTdecay
, the objective likelihood function to be optimized is,
The Lasso constraint is given as,
where are predefined weight. For Adaptive Lasso,
.
Maximize this constraint objective function is equivalent to minimizing the following equation,
Where is taking negative value of objective function above. Increase "lambda" will decrease "s", their relationship is
monotone. Here, we define "penalty" as
. Thus, "lambda" and "penalty" has a positive correlation.
ability |
Estimated ability scores with user given lambda |
likelihood |
Negative likelihood of objective function with user given lambda |
df |
Degree of freedom with user given lambda(number of distinct |
penalty |
|
Lambda |
User given lambda |
ability.path |
if path = TRUE, estimated ability scores on whole Lasso path |
likelihood.path |
if path = TRUE, negative likelihood of objective function on whole Lasso path |
df.path |
if path = TRUE, degree of freedom on whole Lasso path(number of distinct |
penalty.path |
if path = TRUE, |
Lambda.path |
if path = TRUE, Whole Lasso path |
path |
Whether whole Lasso path will be run |
HYBRID.ability.path |
If path = TRUE, the whole path of evolving of HYBRID ability |
HYBRID.likelihood.path |
if path = TRUE, the whole path of HYBRID likelihood |
Masarotto, G. and Varin, C.(2012) The Ranking Lasso and its Application to Sport Tournaments. *The Annals of Applied Statistics* **6** 1949–1970.
Zou, H. (2006) The adaptive lasso and its oracle properties. *J.Amer.Statist.Assoc* **101** 1418–1429.
BTdataframe
for dataframe initialization,
plot.swlasso
, plot.wlasso
are used for Lasso path plot if path = TRUE in this function's run
##Initializing Dataframe x <- BTdataframe(NFL2010) ##The following code runs the main results ##Usually a single lambda's run will take 1-20 s ##The whole Adaptive Lasso run will take 5-20 min ##BTdecayLasso run with exponential decay rate 0.005 and ##lambda 0.1, use path = TRUE if you want to run whole LASSO path y1 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, path = FALSE, decay.rate = 0.005, fixed = x$worstTeam) summary(y1) ##Defining equal weight ##Note that comparing to Adaptive weight, the user defined weight may not be ##efficient in groupiing. Therefore, to run the whole Lasso path ##(evolving of distinct ability scores), it may take a much longer time. ##We recommend the user to apply the default setting, ##where Adaptive Lasso will be run. n <- nrow(x$ability) - 1 w2 <- matrix(1, nrow = n, ncol = n) w2[lower.tri(w2, diag = TRUE)] <- 0 ##BTdecayLasso run with exponential decay rate 0.005 and with a specific lambda 0.1 y2 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, weight = w2, path = FALSE, decay.rate = 0.005, fixed = x$worstTeam) summary(y2)
##Initializing Dataframe x <- BTdataframe(NFL2010) ##The following code runs the main results ##Usually a single lambda's run will take 1-20 s ##The whole Adaptive Lasso run will take 5-20 min ##BTdecayLasso run with exponential decay rate 0.005 and ##lambda 0.1, use path = TRUE if you want to run whole LASSO path y1 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, path = FALSE, decay.rate = 0.005, fixed = x$worstTeam) summary(y1) ##Defining equal weight ##Note that comparing to Adaptive weight, the user defined weight may not be ##efficient in groupiing. Therefore, to run the whole Lasso path ##(evolving of distinct ability scores), it may take a much longer time. ##We recommend the user to apply the default setting, ##where Adaptive Lasso will be run. n <- nrow(x$ability) - 1 w2 <- matrix(1, nrow = n, ncol = n) w2[lower.tri(w2, diag = TRUE)] <- 0 ##BTdecayLasso run with exponential decay rate 0.005 and with a specific lambda 0.1 y2 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, weight = w2, path = FALSE, decay.rate = 0.005, fixed = x$worstTeam) summary(y2)
Model selection via AIC or BIC criteria. For Lasso estimators, the degree of freedom is the number of distinct groups of estimated abilities.
BTdecayLassoC( dataframe, ability, weight = NULL, criteria = "AIC", type = "HYBRID", model = NULL, decay.rate = 0, fixed = 1, thersh = 1e-05, iter = 100, max = 100 )
BTdecayLassoC( dataframe, ability, weight = NULL, criteria = "AIC", type = "HYBRID", model = NULL, decay.rate = 0, fixed = 1, thersh = 1e-05, iter = 100, max = 100 )
dataframe |
Generated using |
ability |
A column vector of teams ability, the last row is the home parameter.
The row number is consistent with the team's index shown in dataframe. It can be generated using |
weight |
Weight for Lasso penalty on different abilities |
criteria |
"AIC" or "BIC" |
type |
"HYBRID" or "LASSO" |
model |
An Lasso path object with class wlasso or swlasso. If NULL, the whole lasso path will be run. |
decay.rate |
The exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more importance to most recent matches and the estimated parameters reflect more on recent behaviour. |
fixed |
A teams index whose ability will be fixed as 0. The worstTeam's index
can be generated using |
thersh |
Threshold for convergence |
iter |
Number of iterations used in L-BFGS-B algorithm. |
max |
Maximum weight for |
This function is usually run after the run of whole Lasso path. "model" parameter is obtained by whole
Lasso pass's run using BTdecayLasso
. If no model is provided, this function will run Lasso path first (time-consuming).
Users can select the information score added to HYBRID Lasso's likelihood or original Lasso's likelihood. ("HYBRID" is recommended)
summary() function can be applied to view the outputs.
Score |
Lowest AIC or BIC score |
Optimal.degree |
The degree of freedom where lowest AIC or BIC score is achieved |
Optimal.ability |
The ability where lowest AIC or BIC score is achieved |
ability |
Matrix contains all abilities computed in this algorithm |
Optimal.lambda |
The lambda where lowest score is attained |
Optimal.penalty |
The penalty (1- s/ |
type |
Type of model selection method |
decay.rate |
Decay rate of this model |
Masarotto, G. and Varin, C.(2012) The Ranking Lasso and its Application to Sport Tournaments. *The Annals of Applied Statistics* **6** 1949–1970.
Zou, H. (2006) The adaptive lasso and its oracle properties. *J.Amer.Statist.Assoc* **101** 1418–1429.
BTdataframe
for dataframe initialization,
BTdecayLasso
for obtaining a whole Lasso path
This function provides a method to computed the estimated abilities and lambda given an intuitive fixed Lasso penalty rate.
Since in Lasso method, the selection of lambda varies a lot with respect to different datasets. We can keep the consistency of
amount of Lasso penalty induced in different datasets from different period by setting a fixed Lasso penalty rate "penalty".
Please refer to BTdecayLasso
for the definition of "penalty" and its relationship with "lambda".
BTdecayLassoF( dataframe, ability, penalty, decay.rate = 0, fixed = 1, thersh = 1e-05, max = 100, iter = 100 )
BTdecayLassoF( dataframe, ability, penalty, decay.rate = 0, fixed = 1, thersh = 1e-05, max = 100, iter = 100 )
dataframe |
Generated using |
ability |
A column vector of teams ability, the last row is the home parameter. It can be generated using |
penalty |
The amount of Lasso penalty induced (1-s/max(s)) where is the sum of Lasso penalty part. |
decay.rate |
The exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more importance to most recent matches and the estimated parameters reflect more on recent behaviour. |
fixed |
A teams index whose ability will be fixed as 0. The worstTeam's index
can be generated using |
thersh |
Threshold for convergence |
max |
Maximum weight for |
iter |
Number of iterations used in L-BFGS-B algorithm. |
The estimated ability given fixed penalty where s is the sum of Lasso penalty part.
When p = 0, this model is reduced to a standard Bradley-Terry Model.
When p = 1, all ability scores are shrinking to 0.
The parameter "penalty" should be ranging from 0.01 to 0.99 due to the iteration's convergent error.
summary() function can be applied to view the outputs.
The list with class "BTF" contains estimated abilities and other parameters.
ability |
Estimated ability scores |
df |
Degree of freedom (number of distinct |
penalty |
Amount of Lasso Penalty |
decay.rate |
Exponential decay rate |
lambda |
Corresponding Lasso lambda given penalty rate |
Masarotto, G. and Varin, C.(2012) The Ranking Lasso and its Application to Sport Tournaments. *The Annals of Applied Statistics* **6** 1949–1970.
Zou, H. (2006) The adaptive lasso and its oracle properties. *J.Amer.Statist.Assoc* **101** 1418–1429.
BTdataframe
for dataframe initialization, BTdecayLasso
for detailed description
A dataframe containing all match results with 5 columns
NFL2010
NFL2010
A dataframe containing all match results with 5 columns
Team who plays at home
Team who plays away
Take "1" if home team wins
Take "1" if away team wins
Number of days until now
Plot the whole lasso path run by BTdecayLasso() with given lambda and path = TRUE
##S3 method for class "swlasso"
##S3 method for class "swlasso"
x |
Object with class "swlasso" |
... |
Further arguments pass to or from other methods |
Plot the whole lasso path run by BTdecayLasso() with lambda = NULL and path = TRUE
##S3 method for class "wlasso"
##S3 method for class "wlasso"
x |
Object with class "wlasso" |
... |
Further arguments pass to or from other methods |