leaderbot.models.BradleyTerry#

class leaderbot.models.BradleyTerry(data, k_cov=0)#

Generalized Bradley-Terry model.

Parameters:

datadict

A dictionary of data that is provided by leaderbot.data.load().

k_covint, default=0

Determines the structure of covariance in the model based on the following values:

None: this means no covariance is used in the model, retrieving the original Bradley Terry model.
0: this assumes covariance is a diagonal matrix.
positive integer: this assumes covariance is a diagonal plus low-rank matrix where the rank of low-rank approximation is k_cov.

See Notes below for further details.

See also

RaoKupper
Davidson

Notes

This class implements a generalization of the Bradley Terry model based on [1], incorporating covariance in the model.

Covariance Model:

This model utilizes a covariance matrix with diagonal plus low-rank structure of the form

\[\mathbf{\Sigma} = \mathbf{D} + \mathbf{\Lambda} \mathbf{\Lambda}^{\intercal},\]

where

\(\mathbf{\Sigma}\) is an \(m \times m\) symmetric positive semi-definite covariance matrix where \(m\) is the number of agents (competitors).
\(\mathbf{D}\): is an \(m \times m\) diagonal matrix with non-negative diagonals.
\(\mathbf{\Lambda}\): is a full-rank \(m \times k_{\mathrm{cov}}\) matrix where \(k_{\mathrm{cov}}\) is given by the input parameter k_cov.

If k_cov=None, the covariance matrix is not used in the model, retrieving the original Bradley-Terry model [2]. If k_cov=0, the covariance model reduces to a diagonal matrix \(\mathbf{D}\).

Tie Model:

The Bradley Terry model does not include the tie outcomes in the data. To consider tie outcomes, use leaderbot.models.RaoKupper or leaderbot.models.Davidson models instead.

References

[1]

Siavash Ameli, Siyuan Zhuang, Ion Stoica, and Michael W. Mahoney. A Statistical Framework for Ranking LLM-Based Chatbots. The Thirteenth International Conference on Learning Representations, 2025.

[2]

Ralph A. Bradley and Milton E. Terry. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.. Biometrika, 39 (3/4), 324-345, 1952.

Examples

>>> from leaderbot.data import load
>>> from leaderbot.models import BradleyTerry

>>> # Create a model
>>> data = load()
>>> model = BradleyTerry(data)

>>> # Train the model
>>> model.train()

>>> # Make inference
>>> prob = model.infer()

Attributes:

xnp.ndarray: A 2D array of integers with the shape (n_pairs, 2) where each row consists of indices [i, j] representing a match between a pair of agents with the indices i and j.
ynp.ndarray: A 2D array of integers with the shape (n_pairs, 3) where each row consists of three counts [n_win, n_loss, n_ties] representing the frequencies of win, loss, and ties between agents i and j given by the corresponding row of the input array x.
agentslist: A list of the length n_agents representing the name of agents (competitors).
n_agentsint: Number of agents (competitors).
paramnp.array, default=None: The model parameters. This array is set once the model is trained.
n_paramint: Number of parameters
k_covint: Number of factors for matrix factorization.

Methods

`loss`([w, return_jac, constraint])	Total loss for all data instances.
`train`([init_param, method, max_iter, tol])	Tune model parameters with maximum likelihood estimation method.
`infer`([x])	Infer the probabilities of win, loss, and tie outcomes.
`predict`([x])	Predict outcome between competitors.
`fisher`([w, epsilon, order])	Observed Fisher information matrix.
`rank`()	Rank competitors based on their scores.
`leaderboard`([max_rank])	Print leaderboard of the agent matches.
`marginal_outcomes`([max_rank, bg_color, ...])	Plot marginal probabilities and frequencies of win, loss, and tie.
`map_distance`([ax, cmap, max_rank, method, ...])	Visualize distance between agents using manifold learning projection.
`cluster`([ax, max_rank, tier_label, method, ...])	Cluster competitors to performance tiers.
`scores`()	Get scores.
`plot_scores`([max_rank, horizontal, ...])	Plots competitors' scores by rank.
`match_matrix`([max_rank, density, source, ...])	Plot match matrices of win and tie counts of mutual matches.