How We Score

A plain-language guide to the statistical models behind the win probabilities, ratings, and tournament simulations.

1. Dixon-Coles Bivariate Poisson Model

The core model is the Dixon-Coles (1997) bivariate Poisson model, originally designed for soccer score prediction. It assumes each team's goals follow a Poisson distribution, with rates determined by the two teams' attack and defense strengths.

λ_home = α_home × β_away × γ

λ_away = α_away × β_home

Where α_team is the team's attack strength, β_team is the team's defense strength (how many goals they allow relative to average), and γ is a home-advantage multiplier (set to 1.0 for neutral-ice games in this league).

A small correction term τ (tau) adjusts the model for low-scoring outcomes (0–0, 1–0, 0–1, 1–1), which occur more often than pure Poisson would predict in hockey:

τ(0,0) = 1 − λ_home × λ_away × ρ

τ(1,0) = 1 + λ_away × ρ

τ(0,1) = 1 + λ_home × ρ

τ(1,1) = 1 − ρ

τ(x,y) = 1.0 for all other scores

The parameter ρ (rho) is fit from the data alongside all team parameters. It captures negative correlation between the two teams' scores at low goal totals. The full probability of a final score (x, y) is then:

P(home = x, away = y) = Poisson(x; λ_home) × Poisson(y; λ_away) × τ(x, y)

2. Attack & Defense Ratings

The model is fit by maximum likelihood — it finds the values of every team's attack (α) and defense (β) that make the observed scores most probable. Parameters are estimated on a log scale and then exponentiated.

Because the model has a scale degeneracy (multiplying all attacks by a constant and dividing all defenses by the same constant leaves the likelihood unchanged), we apply a penalty that pins the geometric mean of all attack values near 1.0:

penalty = 100 × (Σ log α_i)²

For display, both attack and defense are explicitly divided by their respective geometric means so that 1.00x = league average for each metric.

Attack Rating Thresholds

> 1.50x — Elite scorer

> 1.10x — Strong scorer

0.90–1.10x — Average scorer

< 0.90x — Below average scorer

Defense Rating Thresholds

< 0.70x — Elite defense

< 0.90x — Strong defense

0.90–1.10x — Average defense

> 1.10x — Leaky defense

3. Win Probability & Score Matrix

Once we have λ_home and λ_away, we compute a full score probability matrix — every possible final score up to 15–15. Each cell (i, j) holds the probability that the home team scores exactly i goals and the away team scores exactly j.

score_matrix[i, j] = P(home = i, away = j)

Win probabilities are summed directly from the matrix:

P(team 1 wins) = Σ P(i > j) over all cells where i > j

P(draw) = Σ P(i = j) (diagonal)

P(team 2 wins) = Σ P(i < j) over all cells where i < j

The score matrix heatmap shown on each game page uses this same matrix. Darker cells indicate higher-probability scorelines. The matrix is oriented with team 1 (visitor) on the rows and team 2 (home) on the columns.

4. Pythagorean Fallback (small samples)

When fewer than 10 games have been played in the season, Dixon-Coles doesn't have enough data to fit reliably. In that case the model falls back to a Pythagorean win expectancy formula adapted for hockey:

P(team wins) = GF^k / (GF^k + GA^k)

Where GF is goals for, GA is goals against, and k ≈ 2.0 (the exponent tuned for hockey). This gives a quick estimate based on goal ratios when team-specific ratings can't yet be trusted.

Predictions made with fewer than 20 games in the training sample are flagged with a ⚠️ warning on the team page game log.

5. Rolling Pre-game Predictions

To evaluate how well the model predicted each game, we use a strict rolling holdout:

Sort all games by date.
For each game G_n, the training set is only the games played before G_n's date.
Fit the model on the training set and predict G_n.
Record the prediction — never the result of G_n itself.

This ensures zero data leakage: no game's outcome is ever used to predict itself. The win probabilities shown in the game log are the probabilities computed before the game was played.

6. Surprisal (Upset Index)

The Upset Index page ranks games by how surprising their outcome was, given the pre-game prediction. We use surprisal (information content) from information theory:

Surprisal = −log₂( P(actual outcome) )

A game the model was 99% sure about yields surprisal ≈ 0.014 bits if it went as expected, or ≈ 6.6 bits if the underdog won. Higher surprisal = bigger upset. The unit (bits) has a natural interpretation: each bit of surprisal means the outcome was twice as unlikely as expected.

~1 bit

Mild upset
(~33% pre-game odds)

~2 bits

Major upset
(~25% pre-game odds)

3+ bits

Shocking upset
(<12.5% pre-game odds)

7. Monte Carlo Tournament Simulation

The tournament probabilities (championship %, finals %, semifinal %) are estimated by running 100,000 simulations of the remaining tournament bracket.

In each simulation:

Completed games are locked in. Their actual results are used — no re-simulation of games already played.
Remaining games are simulated by drawing a random score from the Dixon-Coles score probability matrix for that matchup.
Overtime is handled: tied games in regulation resolve to a random winner (coin flip) representing OT/SO. OT wins count differently in standings tiebreakers (regulation wins are prioritized).
Pool standings are computed after each simulation run. Tiebreakers: points → regulation wins → head-to-head → goal differential → random draw.
The top teams from each pool advance to the crossover semifinals and then finals.

Championship probability = fraction of 100k simulations where a team wins the final. Semifinal probability = fraction where the team advances out of pool play.

8. Home Ice Advantage

The Home Ice Advantage stat shown on each team page is a simple empirical measure computed directly from the team's game history:

Home avg GD = mean(GF − GA) in all home games

Away avg GD = mean(GF − GA) in all away games

Home Ice Advantage = Home avg GD − Away avg GD

A positive value means the team scores more (or allows fewer) goals when playing at home compared to when playing away. This is a raw statistical observation, not a model parameter — it can be noisy for teams with few home or away games.

The Dixon-Coles model itself uses a shared home-advantage parameter γ (estimated globally across all teams), rather than per-team home ice. The per-team stat shown here is a descriptive supplement.

9. Elimination Games (Draw Redistribution)

Tournament bracket games (semifinals, finals, consolation) cannot end in a draw — tied games go to overtime. The model adjusts probabilities accordingly by redistributing the draw mass to each team proportionally:

P_elim(A wins) = P(A wins) / (P(A wins) + P(B wins))

P_elim(B wins) = P(B wins) / (P(A wins) + P(B wins))

For the score matrix, the diagonal (all tied scores) is zeroed out, and the remaining cells are scaled by 1 / (1 − draw_mass) so they sum to 1.0 again.

In the Monte Carlo simulation, regulation ties in bracket games are resolved by a coin flip (50/50) representing overtime / shootout. The winner is awarded the win regardless of their regulation-time strength advantage.

10. Known Limitations

Every model makes simplifying assumptions. Here are the main ones to keep in mind when interpreting predictions:

1.
No head-to-head adjustment. Ratings are global — a team's attack and defense strengths are estimated across all opponents. If team A consistently dominates team B specifically, the model won't capture that matchup effect.
2.
Equal overtime probability. When a bracket game ties in regulation, the simulation resolves it as a 50/50 coin flip. The stronger team gets no overtime edge.
3.
No recency weighting. All regular season games are weighted equally when fitting the model. A team on a hot streak or in a slump is rated the same as their full-season average.
4.
Poisson independence. Goals are modeled as independent events (aside from the low-score correction). Momentum swings, score effects, line matchups, and pulling the goalie are not captured.
5.
Small sample sizes. Youth divisions often have 20–40 games total. Ratings can shift significantly as new games are played, especially early in the season. Predictions flagged with ⚠️ were made with fewer than 20 games in the training set.

Reference: Dixon, M.J. & Coles, S.G. (1997). "Modelling association football scores and inefficiencies in the football betting market." Journal of the Royal Statistical Society: Series C, 46(2), 265–280.

Year	Elite	Platinum	Upper Gold	Gold	Gold Grp 1	Gold Grp 2
2019	-	-	-	-	-	-
2018	2018	2018	2018	-	-	-
2017	2017	2017	2017	2017	-	-
2016	2016	2016	2016	2016	-	-
2015	2015	2015	2015	2015	-	-
2014	2014	2014	2014	2014	-	-
2013	2013	2013	2013	2013	-	-
2012	2012	2012	2012	-	2012	2012