leaderbot.evaluate.model_selection#

leaderbot.evaluate.model_selection(models, train=False, tie=False, report=True)#

Evaluate model selection.

Parameters:

modelslist[leaderbot.models.BaseModel]: A single or a list of models to be evaluated.

Note

All models should be created using the same dataset to make proper comparison.
trainbool, default=False: If True, the models will be trained. If False, it is assumed that the models are pre-trained.
tiebool, default=False: If False, ties in the data are not counted toward model evaluation. This option is only effective on leaderbot.models.BradleyTerry model, and has no effect on the other models.
reportbool, default=False: If True, a table of the analysis is printed.

Returns:

metricsdict

A dictionary containing the following keys and values:

'name': list of names of the models.
'n_param': list of number of parameters of the models.
'nll': list of negative log-likelihood values of the models.
'aic': list of Akaike information criterion of the models.
'bic': list of Bayesian information criterion of the models.
'cel_win': list of cross entropies for win outcomes.
'cel_loss': list of cross entropies for loss outcomes.
'cel_tie': list of cross entropies for tie outcomes.
'cel_all': list of cross entropies for all outcomes.

Raises:

RuntimeError: if train is False but at least one of the models are not pre-trained.

Examples

>>> import leaderbot as lb
>>> from leaderbot.models import BradleyTerry as BT
>>> from leaderbot.models import RaoKupper as RK
>>> from leaderbot.models import Davidson as DV

>>> # Obtain data
>>> data = lb.data.load()

>>> # Create a list of models to compare
>>> models = [
...    BT(data, k_cov=None),
...    BT(data, k_cov=0),
...    BT(data, k_cov=1),
...    RK(data, k_cov=None, k_tie=0),
...    RK(data, k_cov=0, k_tie=0),
...    RK(data, k_cov=1, k_tie=1),
...    DV(data, k_cov=None, k_tie=0),
...    DV(data, k_cov=0, k_tie=0),
...    DV(data, k_cov=0, k_tie=1)
... ]

>>> # Evaluate models
>>> metrics = lb.evaluate.model_selection(models, train=True,
...                                       report=True)

The above code outputs the following table

+----+--------------+---------+--------+--------------------------------+---------+---------+
|    |              |         |        |               CEL              |         |         |
| id | model        | # param |    NLL |    all     win    loss     tie |     AIC |     BIC |
+----+--------------+---------+--------+--------------------------------+---------+---------+
|  1 | BradleyTerry |     129 | 0.6554 | 0.6553  0.3177  0.3376     inf |   256.7 |  1049.7 |
|  2 | BradleyTerry |     258 | 0.6552 | 0.6551  0.3180  0.3371     inf |   514.7 |  2100.8 |
|  3 | BradleyTerry |     387 | 0.6551 | 0.6550  0.3178  0.3372     inf |   772.7 |  3151.8 |
|  4 | RaoKupper    |     130 | 1.0095 | 1.0095  0.3405  0.3462  0.3227 |   258.0 |  1057.2 |
|  5 | RaoKupper    |     259 | 1.0092 | 1.0092  0.3408  0.3457  0.3228 |   516.0 |  2108.2 |
|  6 | RaoKupper    |     516 | 1.0102 | 1.0102  0.3403  0.3453  0.3245 |  1030.0 |  4202.1 |
|  7 | Davidson     |     130 | 1.0100 | 1.0100  0.3409  0.3461  0.3231 |   258.0 |  1057.2 |
|  8 | Davidson     |     259 | 1.0098 | 1.0098  0.3411  0.3455  0.3231 |   516.0 |  2108.2 |
|  9 | Davidson     |     387 | 1.0075 | 1.0075  0.3416  0.3461  0.3197 |   772.0 |  3151.1 |
+----+--------------+---------+--------+--------------------------------+---------+---------+