leaderbot.evaluate.goodness_of_fit#

leaderbot.evaluate.goodness_of_fit(models, train=False, tie=False, density=False, metric='MAE', report=True)#

Evaluate metrics for goodness of fit.

Parameters:

modelslist[leaderbot.models.BaseModel]

A single or a list of models to be evaluated.

Note

All models should be created using the same dataset to make proper comparison.

trainbool, default=False

If True, the models will be trained. If False, it is assumed that the models are pre-trained.

tiebool, default=False

If False, ties in the data are not counted toward model evaluation. This option is only effective on leaderbot.models.BradleyTerry model, and has no effect on the other models.

densitybool, default=False

If False, the frequency (count) of events are evaluated. If True, the probability density of the events are evaluated.

Note

When density is set to True, the probability density values are multiplied by 100.0, and the results of errors should be interpreted in percent.

metric{'err', 'MAPE', 'SMAPE', 'RMSE'}, default= 'MAE'

The metric of comparison:

'MAE': Mean absolute error.
'MAPE': Mean absolute percentage error.
'SMAPE': Symmetric mean absolute percentage error.
'RMSE': Root mean square error.

reportbool, default=False

If True, a table of the analysis is printed.

Returns:

metricsdict

A dictionary containing the following keys and values:

'name': list of names of the models.
'kld': list of Kullback-Leiber divergences of the models.
'jsd': list of Jensen-Shannon divergences of the models.
'err_win': list of errors for win predictions.
'err_loss': list of errors for loss predictions.
'err_tie': list of errors for tie predictions.
'err_all': list of errors for overall predictions.

Raises:

RuntimeError: if train is False but at least one of the models are not pre-trained.

Examples

>>> import leaderbot as lb
>>> from leaderbot.models import BradleyTerry as BT
>>> from leaderbot.models import RaoKupper as RK
>>> from leaderbot.models import Davidson as DV

>>> # Obtain data
>>> data = lb.data.load()

>>> # Create a list of models to compare
>>> models = [
...    BT(data, k_cov=None),
...    BT(data, k_cov=0),
...    BT(data, k_cov=1),
...    RK(data, k_cov=None, k_tie=0),
...    RK(data, k_cov=0, k_tie=0),
...    RK(data, k_cov=1, k_tie=1),
...    DV(data, k_cov=None, k_tie=0),
...    DV(data, k_cov=0, k_tie=0),
...    DV(data, k_cov=0, k_tie=1)
... ]

>>> # Evaluate models
>>> metrics = lb.evaluate.goodness_of_fit(models, train=True,
...                                       report=True)

The above code outputs the following table

+----+--------------+----------------------------+------+------+
|    |              |             MAE            |      |      |
| id | model        |   win   loss    tie    all | KLD% | JSD% |
+----+--------------+----------------------------+------+------+
|  1 | BradleyTerry |  18.5   18.5  -----   18.5 | 1.49 | 0.44 |
|  2 | BradleyTerry |  15.3   15.3  -----   15.3 | 1.42 | 0.42 |
|  3 | BradleyTerry |  12.9   12.9  -----   12.9 | 1.40 | 0.42 |
|  4 | RaoKupper    |  27.5   31.1   45.4   34.7 | 3.32 | 0.92 |
|  5 | RaoKupper    |  26.2   29.6   45.7   33.8 | 3.23 | 0.90 |
|  6 | RaoKupper    |  25.1   27.8   42.8   31.9 | 3.28 | 0.87 |
|  7 | Davidson     |  28.6   32.2   49.0   36.6 | 3.41 | 0.94 |
|  8 | Davidson     |  27.5   30.8   49.3   35.9 | 3.32 | 0.92 |
|  9 | Davidson     |  24.1   25.0   35.7   28.2 | 2.93 | 0.81 |
+----+--------------+----------------------------+------+------+