leaderbot.evaluate.goodness_of_fit#
- leaderbot.evaluate.goodness_of_fit(models, train=False, tie=False, density=False, metric='MAE', report=True)#
Evaluate metrics for goodness of fit.
- Parameters:
- modelslist[leaderbot.models.BaseModel]
A single or a list of models to be evaluated.
Note
All models should be created using the same dataset to make proper comparison.
- trainbool, default=False
If True, the models will be trained. If False, it is assumed that the models are pre-trained.
- tiebool, default=False
If False, ties in the data are not counted toward model evaluation. This option is only effective on
leaderbot.models.BradleyTerry
model, and has no effect on the other models.- densitybool, default=False
If False, the frequency (count) of events are evaluated. If True, the probability density of the events are evaluated.
Note
When
density
is set to True, the probability density values are multiplied by100.0
, and the results of errors should be interpreted in percent.- metric{
'err'
,'MAPE'
,'SMAPE'
,'RMSE'
}, default='MAE'
The metric of comparison:
'MAE'
: Mean absolute error.'MAPE'
: Mean absolute percentage error.'SMAPE'
: Symmetric mean absolute percentage error.'RMSE'
: Root mean square error.
- reportbool, default=False
If True, a table of the analysis is printed.
- Returns:
- metricsdict
A dictionary containing the following keys and values:
'name'
: list of names of the models.'kld'
: list of Kullback-Leiber divergences of the models.'jsd'
: list of Jensen-Shannon divergences of the models.'err_win'
: list of errors for win predictions.'err_loss'
: list of errors for loss predictions.'err_tie'
: list of errors for tie predictions.'err_all'
: list of errors for overall predictions.
- Raises:
- RuntimeError
if
train
is False but at least one of the models are not pre-trained.
Examples
>>> import leaderbot as lb >>> from leaderbot.models import BradleyTerry as BT >>> from leaderbot.models import RaoKupper as RK >>> from leaderbot.models import Davidson as DV >>> # Obtain data >>> data = lb.data.load() >>> # Create a list of models to compare >>> models = [ ... BT(data, k_cov=None), ... BT(data, k_cov=0), ... BT(data, k_cov=1), ... RK(data, k_cov=None, k_tie=0), ... RK(data, k_cov=0, k_tie=0), ... RK(data, k_cov=1, k_tie=1), ... DV(data, k_cov=None, k_tie=0), ... DV(data, k_cov=0, k_tie=0), ... DV(data, k_cov=0, k_tie=1) ... ] >>> # Evaluate models >>> metrics = lb.evaluate.goodness_of_fit(models, train=True, ... report=True)
The above code outputs the following table
+----+--------------+----------------------------+------+------+ | | | MAE | | | | id | model | win loss tie all | KLD% | JSD% | +----+--------------+----------------------------+------+------+ | 1 | BradleyTerry | 18.5 18.5 ----- 18.5 | 1.49 | 0.44 | | 2 | BradleyTerry | 15.3 15.3 ----- 15.3 | 1.42 | 0.42 | | 3 | BradleyTerry | 12.9 12.9 ----- 12.9 | 1.40 | 0.42 | | 4 | RaoKupper | 27.5 31.1 45.4 34.7 | 3.32 | 0.92 | | 5 | RaoKupper | 26.2 29.6 45.7 33.8 | 3.23 | 0.90 | | 6 | RaoKupper | 25.1 27.8 42.8 31.9 | 3.28 | 0.87 | | 7 | Davidson | 28.6 32.2 49.0 36.6 | 3.41 | 0.94 | | 8 | Davidson | 27.5 30.8 49.3 35.9 | 3.32 | 0.92 | | 9 | Davidson | 24.1 25.0 35.7 28.2 | 2.93 | 0.81 | +----+--------------+----------------------------+------+------+