How the Predictions Work
A plain-language guide to the statistical models behind the win probabilities, ratings, and tournament simulations.
1. Dixon-Coles Bivariate Poisson Model
The core model is the Dixon-Coles (1997) bivariate Poisson model, originally designed for soccer score prediction. It assumes each team's goals follow a Poisson distribution, with rates determined by the two teams' attack and defense strengths.
Where αteam is the team's attack strength, βteam is the team's defense strength (how many goals they allow relative to average), and γ is a home-advantage multiplier (set to 1.0 for neutral-ice games in this league).
A small correction term τ (tau) adjusts the model for low-scoring outcomes (0–0, 1–0, 0–1, 1–1), which occur more often than pure Poisson would predict in hockey:
The parameter ρ (rho) is fit from the data alongside all team parameters. It captures negative correlation between the two teams' scores at low goal totals. The full probability of a final score (x, y) is then:
2. Attack & Defense Ratings
The model is fit by maximum likelihood — it finds the values of every team's attack (α) and defense (β) that make the observed scores most probable. Parameters are estimated on a log scale and then exponentiated.
Because the model has a scale degeneracy (multiplying all attacks by a constant and dividing all defenses by the same constant leaves the likelihood unchanged), we apply a penalty that pins the geometric mean of all attack values near 1.0:
For display, both attack and defense are explicitly divided by their respective geometric means so that 1.00x = league average for each metric.
3. Win Probability & Score Matrix
Once we have λhome and λaway, we compute a full score probability matrix — every possible final score up to 15–15. Each cell (i, j) holds the probability that the home team scores exactly i goals and the away team scores exactly j.
Win probabilities are summed directly from the matrix:
The score matrix heatmap shown on each game page uses this same matrix. Darker cells indicate higher-probability scorelines. The matrix is oriented with team 1 (visitor) on the rows and team 2 (home) on the columns.
4. Pythagorean Fallback (small samples)
When fewer than 10 games have been played in the season, Dixon-Coles doesn't have enough data to fit reliably. In that case the model falls back to a Pythagorean win expectancy formula adapted for hockey:
Where GF is goals for, GA is goals against, and k ≈ 2.0 (the exponent tuned for hockey). This gives a quick estimate based on goal ratios when team-specific ratings can't yet be trusted.
Predictions made with fewer than 20 games in the training sample are flagged with a ⚠️ warning on the team page game log.
5. Rolling Pre-game Predictions
To evaluate how well the model predicted each game, we use a strict rolling holdout:
- Sort all games by date.
- For each game Gn, the training set is only the games played before Gn's date.
- Fit the model on the training set and predict Gn.
- Record the prediction — never the result of Gn itself.
This ensures zero data leakage: no game's outcome is ever used to predict itself. The win probabilities shown in the game log are the probabilities computed before the game was played.
6. Surprisal (Upset Index)
The Upset Index page ranks games by how surprising their outcome was, given the pre-game prediction. We use surprisal (information content) from information theory:
A game the model was 99% sure about yields surprisal ≈ 0.014 bits if it went as expected, or ≈ 6.6 bits if the underdog won. Higher surprisal = bigger upset. The unit (bits) has a natural interpretation: each bit of surprisal means the outcome was twice as unlikely as expected.
(~33% pre-game odds)
(~25% pre-game odds)
(<12.5% pre-game odds)
7. Monte Carlo Tournament Simulation
The tournament probabilities (championship %, finals %, semifinal %) are estimated by running 100,000 simulations of the remaining tournament bracket.
In each simulation:
- Completed games are locked in. Their actual results are used — no re-simulation of games already played.
- Remaining games are simulated by drawing a random score from the Dixon-Coles score probability matrix for that matchup.
- Overtime is handled: tied games in regulation resolve to a random winner (coin flip) representing OT/SO. OT wins count differently in standings tiebreakers (regulation wins are prioritized).
- Pool standings are computed after each simulation run. Tiebreakers: points → regulation wins → head-to-head → goal differential → random draw.
- The top teams from each pool advance to the crossover semifinals and then finals.
Championship probability = fraction of 100k simulations where a team wins the final. Semifinal probability = fraction where the team advances out of pool play.
8. Home Ice Advantage
The Home Ice Advantage stat shown on each team page is a simple empirical measure computed directly from the team's game history:
A positive value means the team scores more (or allows fewer) goals when playing at home compared to when playing away. This is a raw statistical observation, not a model parameter — it can be noisy for teams with few home or away games.
The Dixon-Coles model itself uses a shared home-advantage parameter γ (estimated globally across all teams), rather than per-team home ice. The per-team stat shown here is a descriptive supplement.