leaderbot.evaluate.model_selection#

leaderbot.evaluate.model_selection(models, train=False, tie=False, report=True)#

Evaluate model selection.

Parameters:
modelslist[leaderbot.models.BaseModel]

A single or a list of models to be evaluated.

Note

All models should be created using the same dataset to make proper comparison.

trainbool, default=False

If True, the models will be trained. If False, it is assumed that the models are pre-trained.

tiebool, default=False

If False, ties in the data are not counted toward model evaluation. This option is only effective on leaderbot.models.BradleyTerry model, and has no effect on the other models.

reportbool, default=False

If True, a table of the analysis is printed.

Returns:
metricsdict

A dictionary containing the following keys and values:

  • 'name': list of names of the models.

  • 'n_param': list of number of parameters of the models.

  • 'nll': list of negative log-likelihood values of the models.

  • 'aic': list of Akaike information criterion of the models.

  • 'bic': list of Bayesian information criterion of the models.

  • 'cel_win': list of cross entropies for win outcomes.

  • 'cel_loss': list of cross entropies for loss outcomes.

  • 'cel_tie': list of cross entropies for tie outcomes.

  • 'cel_all': list of cross entropies for all outcomes.

Raises:
RuntimeError

if train is False but at least one of the models are not pre-trained.

Examples

>>> import leaderbot as lb
>>> from leaderbot.models import BradleyTerry as BT
>>> from leaderbot.models import RaoKupper as RK
>>> from leaderbot.models import Davidson as DV

>>> # Obtain data
>>> data = lb.data.load()

>>> # Create a list of models to compare
>>> models = [
...    BT(data, k_cov=None),
...    BT(data, k_cov=0),
...    BT(data, k_cov=1),
...    RK(data, k_cov=None, k_tie=0),
...    RK(data, k_cov=0, k_tie=0),
...    RK(data, k_cov=1, k_tie=1),
...    DV(data, k_cov=None, k_tie=0),
...    DV(data, k_cov=0, k_tie=0),
...    DV(data, k_cov=0, k_tie=1)
... ]

>>> # Evaluate models
>>> metrics = lb.evaluate.model_selection(models, train=True,
...                                       report=True)

The above code outputs the following table

+----+--------------+---------+--------+--------------------------------+---------+---------+
|    |              |         |        |               CEL              |         |         |
| id | model        | # param |    NLL |    all     win    loss     tie |     AIC |     BIC |
+----+--------------+---------+--------+--------------------------------+---------+---------+
|  1 | BradleyTerry |     129 | 0.6554 | 0.6553  0.3177  0.3376     inf |   256.7 |  1049.7 |
|  2 | BradleyTerry |     258 | 0.6552 | 0.6551  0.3180  0.3371     inf |   514.7 |  2100.8 |
|  3 | BradleyTerry |     387 | 0.6551 | 0.6550  0.3178  0.3372     inf |   772.7 |  3151.8 |
|  4 | RaoKupper    |     130 | 1.0095 | 1.0095  0.3405  0.3462  0.3227 |   258.0 |  1057.2 |
|  5 | RaoKupper    |     259 | 1.0092 | 1.0092  0.3408  0.3457  0.3228 |   516.0 |  2108.2 |
|  6 | RaoKupper    |     516 | 1.0102 | 1.0102  0.3403  0.3453  0.3245 |  1030.0 |  4202.1 |
|  7 | Davidson     |     130 | 1.0100 | 1.0100  0.3409  0.3461  0.3231 |   258.0 |  1057.2 |
|  8 | Davidson     |     259 | 1.0098 | 1.0098  0.3411  0.3455  0.3231 |   516.0 |  2108.2 |
|  9 | Davidson     |     387 | 1.0075 | 1.0075  0.3416  0.3461  0.3197 |   772.0 |  3151.1 |
+----+--------------+---------+--------+--------------------------------+---------+---------+