Bringing the nerdiest problem in sports to my fake football game

March 24, 2026

In 2007, I added my name to a list.

Kenneth Massey maintains a composite of computer ranking systems for college football. When he started in 1995, there were six or seven systems. By the time I joined, there were dozens. By 2012, over 100. The list includes independent methodologies built by people working across math, engineering, and hobbyist spaces. Entry requires a defined system and a public link.

I used to run a college football computer ranking called Omnivore Rankings. Listed on Massey's composite from around 2007-2014. Crowned mid-major national champions nobody asked for. The entire thing was retrodictive — designed to look backward and find patterns, not predict winners. Built to answer "who actually had the best season" rather than "who would win on a neutral field."

I got obsessed with it the way people get obsessed with Wordle or Rubik's cubes. The problem itself was interesting, because college football is a broken dataset. 130+ teams, 12-13 games each, zero common opponents between conferences, completely unbalanced schedules. The sample size is a joke. The schedule disparity is absurd. And yet every fall, someone has to decide who's #1. These days, there are playoffs for this grafted onto the existing Bowl system, but for decades everyone refused it because they liked the tradition of the current setup.

The whole problem is catnip for math nerds, which I didn't discover until I made my own rankings. To this day, there are still 70+ computer ranking systems on Massey's composite. Not because anyone thinks their formula will definitively solve college football. Because the problem is fun. It's a constraint satisfaction puzzle wrapped in a sports argument wrapped in incomplete data.

Here's what makes it interesting:

The BCS used to ban margin of victory from computer rankings. So you'd get systems like Colley Matrix (pure win/loss, no MOV, completely bias-free) sitting next to systems that weighted recent games heavily or factored in opponent's opponent strength. Same game results, wildly different outputs. When they disagreed, that was the story. UCF claiming a 2017 national championship because Colley Matrix had them #1? That's the variance talking. That's the math problem eating itself.

Most ranking systems are trying to solve one of two problems:

1.
Retrodictive — who had the best season given the games they played
2.
Predictive — who would win if these two teams played tomorrow

Omnivore was retrodictive. It didn't care about predictive power. It cared about resume. Quality wins mattered more than blowouts. Strength of schedule wasn't opponent win percentage (which punishes you for playing good teams in strong conferences) — it was opponent rating. A 10-2 team that played a murderer's row could outrank a 12-0 team that played nobody.

That distinction — retrodictive vs predictive — is where most of the ranking disagreements come from. Elo is predictive. Colley is retrodictive.

Split national champions were the perfect embodiment of this. Before the playoff era (first the BCS and now the CFP), college football had no playoff. The AP Poll (writers) and the Coaches Poll crowned champions independently. Sometimes they agreed. Sometimes they didn't. 1997: Michigan wins the AP, Nebraska wins the Coaches. Both claim the title. The math nerds built computer rankings trying to solve this problem, and instead they just added more voices to the argument. In the old days, there were even more national champions based on polls. The chaos of today almost makes you yearn for chaotic simplicity of those days.

And that brings me to Viperball.

I'm building a composite computer ranking system for college viperball. Like the comparison there are 20-30 different systems running against the same game data from my simulation, producing different results, showing variance. Consensus rankings with standard deviation. Similar to the Massey rankings comparison, but entirely made for Viperball using in-game data.

Because Viperball is a better math problem than college football. The sport obviously has football bones, but the game mechanics make it more variable and inherently unfair. Despite a robust playoff system built into my game, I thought it'd be fun to take this math problem and embed it into the game.

The sample size issue is the same — 198 teams, 11-13 games each, regional conferences with limited cross-conference play. The schedule imbalance is real. The "who's actually good" question has the same structural challenges.

But Viperball adds new terrain that doesn't exist in football: Delta Yards.

Delta Yards is a rubberband mechanic. Leading team pays a cost. Trailing team gets a bonus. The system compresses scoring. Blowouts are harder to achieve. Comebacks are structurally enabled.

So now the ranking problem gets weirder:

Does a team that wins close games with good lead management deserve credit for discipline, or are they just lucky? Is a 10-2 team that never trailed by more than one score better or worse than a 10-2 team that blew teams out early but let them creep back?
Does a blowout against a weak opponent tell you more or less than a tight game against a ranked team when the scoring system actively compresses margins? If Team A beats a cupcake 45-18 and Team B beats a top-10 team 31-27, which result is more impressive when Delta Yards made the 45-18 margin harder to achieve than it would be in normal football?
Is Delta Yards efficiency a skill or a proxy for schedule strength? Teams that face tougher opponents spend more time trailing. Teams that spend more time trailing get Delta bonuses. Are they gaming a broken system, or are they legitimately better at managing adversity?

The college football ranking problem is fun because it's hard to compare teams with wildly different schedules. The Viperball ranking problem is fun because it's hard to compare teams with wildly different schedules and you have to account for a scoring system that treats leads and deficits asymmetrically. Teams are surely getting ripped off by a format that rewards their opponent when their opponent takes a lead. Teams that do well in this sport earn those victories, but teams who can't finish or lose leads late are surely getting penalized by a system that doesn't necessarily favor the underdog who takes a big lead against a tougher opponent.

If the math nerds knew this problem existed, they'd be building ranking systems for it. Viperball has all of the mechanics of college football, with a wackier scoring system and the randomness of baseball on some level.

Soren Sorensen, the Danish physics professor who's been ranking college football teams since the '90s, said it best: "I love the math of it, but I also enjoy that all they are doing is not controlled completely by some math equations."

The problem is interesting. The variance will be interesting. When 20 different ranking systems disagree wildly on whether Nebraska or Air Force deserves a playoff spot, that's the whole point.

I'll be curious what I'm able to learn about this sport from these computer rankings, as I've learned so much over the past few weeks about this sport from simulating games.

One thing about inventing a sport, once you've made it and put it into the wild, you don't stop being surprised by it and learning new things about it.

P.S. New York Times wrote about computer rankings years ago, if you want to read about them at their peak.

Subscribe to Sideline Engineering

to get updates in Reader, RSS, or via Bluesky Feed

More sims led to more changes

What the Delta Kickoff System Actually Is