When it comes time to fill out your bracket for the 2009 tourney, you could mull over a mountain of statistics, everything from from seeding, conference affiliation and coaching experience to pre-tourney momentum, offensive output, margin of victory and much, much more. In the tourney database I've been building since 1990, I track about 75 separate attributes. With so much data available, it's easy for me to lose sight of which stats really matter in determining the teams to advance in my bracket.
That's why I developed PASE. PASE compares the total number of wins that teams with given attributes attain to the number their seeding indicates that they should've achieved. PASE is calculated by tallying the positive or negative differences between actual and expected wins at each seed position. The total of these differences is divided by the number of appearances to arrive at an average number of games that teams either over- or underperform per tourney. In short, PASE provides a way to measure the relative impact of team attributes on tourney performance.
PASE is a useful tool for analyzing the key indicators of tourney advancement, but it's only really effective if applied to the right statistics. While I've been working for years on ways to become a better bracketeer, statistical gurus like Ken Pomeroy and John Gasaway have been working on methods to get a more accurate reflection of the strengths and weaknesses of basketball teams in actual game play. Their primary tool is tempo-free, or possession-based, statistics. As Basketball Prospectus readers know, Ken and John contend that raw numbers like points scored and allowed are only meaningful in the context of the number of a times a team possesses the ball or defends against a possession. In other words, the most accurate way to gauge a team's offensive or defensive ability is to analyze its efficiency in scoring or preventing scores. Which team is better offensively: a grind-it-out team that has 60 possessions in a game and scores 66 points, or a greyhound squad that has 80 possessions and gets 80 points? Sure, the greyhounds score more points, but they get an average of only one point per possession. Meanwhile, the grinders score an average of 1.1 points.
Tempo-free statisticians have devised basic formulas to calculate four key numbers:
- The number of possessions a given team has per game (called "tempo")
- The number of points a given team scores per 100 possessions (called "offensive efficiency")
- The number of points a given team allows per 100 possessions (called "defensive efficiency")
- The predicted winning percentage of a team based on its offensive and defensive efficiency (called "Pythagorean winning percentage")
I could go into great detail on how these numbers are calculated, but it would take a lot of explaining. Besides, Ken and John have already done it, and much better than I ever could. They've even adjusted the stats for the quality of opponent and home court advantage. What I was curious about--and what anyone interested in building a better bracket would no doubt like to know--is whether these tempo-free stats can actually help predict tourney overachievers.
Fortunately, that's just the kind of question that PASE can help answer. Ken was generous enough to share with me the pre-tourney data from 2004 to 2008 so that I could conduct this analysis. Five years isn't exactly a huge sample size, but it's worth finding out as early as possible whether there's any connection between tempo-free stats and tourney success so that we can track it into the future. With that, let's take a look at the four possession-based stats and use PASE to analyze their predictive value.
Possession-based stats remove the bias of playing tempo from the assessment of a team's offensive or defensive effectiveness. Eliminating the influence of tempo assumes, however, that there isn't any inherent value in playing the game faster or slower that supercedes efficiency. It's a reasonable assumption, but is it true, particularly in the pressure cooker of March Madness? Does an up-tempo style of play lead to tourney overachievement, or are more deliberate teams more likely to exceed seed expectations?
We evaluated the tempo statistics and tourney results of the 120 teams seeded one through six since 2004 (restricting our analysis to the teams expected to have the most success), and here's what we found:
Surprisingly, the 35 one- through six-seeded teams ranked as the 16 fastest playing squads in their year are actually the biggest underachievers. Based on seed projections, they should've won about 75 games; in fact, they won 69 games, yielding an underperforming PASE of -.173. Conversely, teams ranked among the 16 slowest-paced squads for their year are the highest overachievers. Their win total of 53 games is 5.5 more than seed projections. That works out to a +.190 PASE.
Here's another way to assess the value of playing tempo in tourney performance. Among teams seeded one through six since 2004, the median number of possessions they've averaged per game is 67.35 (or one possession every 17.8 seconds, for the statistically curious). If you run the PASE numbers on the 60 teams with an above-median number of possessions, you find that they're -.068 PASE underachievers who exceed expectations just 33 percent of the time. Meanwhile, the 60 teams that hold the ball longer than the median amount of time are +.068 overachievers, beating seed projections at a 43% clip.
So, it would seem that a slower playing tempo correlates to higher tourney overachievement...but here's an interesting counterpoint to that conclusion: four of the five most recent tourney champs have been among the 60 faster-paced squads. Only Florida in 2006 played at a slower tempo than the median rate. Bottom line: while a more deliberate style of play has led to slight overachievement in the tourney, the teams that wind up cutting down the nets still play at a faster pace than usual.
In the 24 years of the 64-team tourney era, offensive firepower has been a key indicator of tourney achievement. Teams averaging more than 77 points a game are +.077 overachievers, account for 72 of the 96 Final Four slots, and have won 21 of 24 tourneys. Given these numbers, it would only stand to reason that offensive efficiency would be a strong indicator of overperformance as well. Wouldn't it?
The answer is yes--but not as strong a "yes" as you might think. In fact, one could argue that playing tempo has a stronger correlation to overachievement than offensive efficiency does. The median number of points scored by the 120 2004-2008 teams seeded one through six is 117.3 per 100 possessions. Teams falling below median offensive efficiency have a -.047 PASE, with just five Final Four appearances and one championship. Teams that score at an above-median efficiency rate have a +.047 PASE, with 14 Final Four trips and four championships. One thing is clear: there's a bigger PASE gulf--and therefore a stronger correlation to tourney performance--for teams playing below and above median tempo (-.068 plus +.068 comes to a .136 discrepancy) than there is for teams scoring above and below the median rate of offensive efficiency (just a .094 discrepency).
That said, 14 of the 19 Final Four contenders seeded one through six (George Mason occupied the other semi-final slot and isn't in the analysis because they were an 11 seed) have been more offensively efficient than the median rate. Only nine of 18 Final Four contenders have played at a more deliberate pace than the median higherseeded squad. Given the choice, I would rather base my bracket picks--particularly my deep advancers--on offensive efficiency than playing tempo.
Here's another indication that offensive efficiency is relevant in assessing a team's tourney performance: one- through six-seeded teams ranked among the top 16 most offensively efficient squads in their year account for all five tourney champs and 14 Final Four slots. Sure, their +.036 PASE is nothing to write home about, but there's no denying that teams reaching the late rounds of the dance must be productive with the basketball.
That's worth remembering, but the much larger point about offensive efficiency is that it's not as strong an indicator of overachievement as raw scoring output would lead you to believe. In fact, the correlation between offensive efficiency and tourney performance is much softer than it is between defensive efficiency and PASE achievement results. When the 120 higher-seeded teams are evaluated through what I call "seed status" analysis, the differences in the value of offensive and defensive efficiency come more clearly into view.
Here's how this type of analysis works: if offensive efficiency perfectly reflected the relative strengths of the top 24 teams in each tourney, then they ought to be seeded according to where they ranked in points per 100 possessions. If a team was ranked ninth in offensive efficiency, it should be given a No. 3 seed (since teams ranked first through eighth would take the top two seeds). If this team was "over-seeded," it would've been given a one or two seed. If it was under-seeded, the team would've been demoted to a four seed or lower.
If offensive efficiency had any validity as a performance indicator, the teams that were seeded exactly where their points per 100 possessions ranking dictated could be expected to perform to seed expectations. Those teams elevated above where their offensive efficiency placed them, however, would underachieve against expectations (since they didn't deserve their loftier seed). And the teams that were dropped below where their offensive efficiency would've placed them could be expected to overachieve. So what's happened over the last five years? Take a peek:
Teams seeded exactly where they should've been according to offensive efficiency didn't just meet expectations; they exceeded them at a +.131 PASE clip. That's not terribly surprising, but this is: the teams that were seeded above what they deserved based on offensive efficiency nearly met expectations (-.003 PASE) when they should've been the biggest underachievers. Conversely, those teams with greater offensive productivity than the seed position to which they were relegated were the largest underachieving group (-.095 PASE) when they should've been the biggest overperformers.
This discrepancy between how over- and under-seeded offensively efficient teams should and actually do perform is one more sign that "points per 100 possessions," while a mild indicator of overachievement, isn't exactly a "must-track" stat. It's certainly less meaningful than defensive efficiency, as we're about to see.
For years, I argued against the old sports adage that "defense wins championships" in the NCAA tournament. The raw points-scored and points-allowed numbers seemed to indicate that offensive firepower was a much more reliable indicator of a deep tourney run than defensive stinginess. When I ran the possession-based defensive efficiency numbers through PASE analysis, however, I discovered that defense actually does matter.
In fact, by nearly every measure, defensive efficiency is a decisively stronger indicator of overachievement than offensive efficiency is. Let's start by examining above- and below-median defensively efficient squads. Since 2004, the median number of points a one- through six-seeded team has allowed per 100 possessions is 88.8. The 60 higher-seeded teams that forced opponents to score less efficiently than that have a solid PASE of +.131 and account for 16 of 19 Final Four slots and four of five championships. Compare that to the 60 higher-seeded squads that allow opponents to score with more-than-median efficiency (an underachieving -.131 PASE, just three of 19 Final Four slots and one of five championships). Now, consider above-median offensively efficient higher seeds: they muster a +.047 PASE, 14 of 19 semi-final slots and four of five tourney crowns.
The supremacy of defense over offense is also evident in the PASE values of the higher-seeded squads when divided into four defensive efficiency ranking categories. Check out these numbers:
Teams ranked among the top 16 most defensively efficient squads in their particular years have a +.092 PASE, 16 Final Four appearances and four championships. The next tier of teams-those ranked between 17 and 32 for their year-has a similar PASE (+.094), with three semi-final trips and one tourney crown. The two most defensively inefficient tiers--those ranked 33 to 48 and 49 to 64--have sizeable underachieving PASE records of -.414 and -.413, respectively. Not only that, but no team in the last five years has reached the Final Four that was among the bottom half of their tourney field in defensive efficiency.
If that doesn't convince you that defensive efficiency is critical to a deep, seed-defying run, then check out the seed-status analysis. Remember the logic here: If defensive efficiency had any validity as a performance indicator, the teams seeded above where their defensive efficiency placed them would underachieve against expectations (since they didn't deserve their loftier seed). And the teams that were dropped below where their defensive efficiency would've seeded them could be expected to overachieve. Unlike offensive efficiency, this is exactly what's happened. Take a look at the results:
The 65 teams that were seeded above where their defensive efficiency would indicate did in fact underachieve as expected-by almost 12 games, for a PASE of -.181. The 30 teams that were under-seeded according to defensive efficiency demonstrated the value of defense by overachieving at a +.080 clip.
There are no two ways about it: since 2004, defensive efficiency has been a stronger and more consistent predictor of tourney overachievement than offensive efficiency has been. Of course, the last five tourneys have been among the six lowest-scoring in the history of the 64-team era. So it's unclear whether defensive efficiency has trumped offensive efficiency throughout the 24 years of the modern tourney. That said, with the recent trend of lower-scoring fields, my guess is that defensive efficiency will continue to be a key performance indicator.
Pythagorean Winning Percentage
If you pay attention to only one tempo-free stat, keep your eye on what Ken Pomeroy calls "Pythagorean winning percentage." You can read more about the formula Ken uses to generate the stat on kenpom.com, but suffice it to say that "Pythag," as they call it, is based on offensive and defensive efficiency with no consideration for actual winning records.
As powerful a predictor of tourney performance as defensive efficiency is, Pythag is even better. Among the top 24 seeded teams from the last five years, those above the Pythag median winning percentage of .9576 have an overachieving PASE of +.198, while those below underperform at a -.198 PASE rate. The top eight Pythag teams in each of the five dances studied have an eye-popping PASE of +.506, better than half a game per tourney better than expectations. In fact, of the 40 teams, 18 have gone to the Final Four and all five champions are among them.
Heck, the NCAA Selection Committee might do well to factor Pythag into their seeding deliberations. If Pythag rankings were a valid guide to seeding, you'd expect teams seeded above where their Pythag dictated to underperform and teams "demoted" below their Pythag ranking to overperform. That's precisely what happens...big time:
The teams that were seeded right where Pythag suggested they should be performed the closest to seed expectations, just as you'd expected them to if Pythag had validity as a measure of tourney performance. Meanwhile, those teams that were elevated above their Pythag ranking predictably tanked, posting a stunning -.345 PASE. The teams that were demoted below the level of their Pythag value? They massively overachieved, netting a strong +.335 PASE. Perhaps most tellingly, only one higher-seeded squad that was overseeded reached the Final Four; that was North Carolina last year--and they got pounded by Kansas. Every other semi-finalist and all the champions were either right-seeded or under-seeded according to Pythag.
Comparing Tempo-free Stats to Leading PASE Attributes
The evidence, at least for the last five years, is conclusive: tempo-free stats are a pretty big deal. So far, we've learned that:
- There is little inherent value in playing an up-tempo brand of ball; if anything, the more deliberate teams are stronger tourney overachievers.
- Offensive efficiency does have a positive correlation to tourney overperformance, but it isn't nearly as strong as the value of raw offensive firepower might suggest.
- Defensive efficiency, on the other hand, is a far more critical requirement for a deep tourney run than than offensive efficiency.
- Pythag is the most reliable guide to tourney advancement of all tempo-free stats.
That invites one big question: how do tempo-free stats compare to other key attributes as predictors of overachievement? A thorough answer to this question would require a whole new study, but let's restrict our comparison to seven other leading factors. Here's how "above-median" Pythag and offensive and defensive efficiency stack up to "above-median" (or as close as I can get to it) points scored, points allowed, victory margin, coaching experience, team experience and winning percentage for 2004 through 2008.
Granted, five tourneys isn't a big sample size, but it's clear that tempo-free stats rank right up there with the top predictors of tourney performance. It's a little surprising that raw "points allowed" is a little better than Pythag in identifying tourney overachievers. But these last five tourneys have been unusually lower scoring than past dances. That said, I suspect that the trend toward slower-paced college ball will continue.
Some of the other surprising stats on this list involve pre-tourney momentum and winning percentage. For once, the pundits appear to be right: teams that are hotter coming to the dance do overachieve. I never would've imagined that raw winning percentage mattered, but the numbers don't lie. All five of the tourney champs--and 16 of 19 semi-finalists--have owned a record better than .796.
Any of these performance indicators, considered individually, aren't nearly as powerful in predicting tourney success as they are when combined. Consider the squads that are on the better side of the top nine of these statistics (I dropped team experience since it's a slight negative indicator. There are just five of them (Oklahoma State and UConn in 2004, Louisville in 2005, and Kansas in 2007 and 2008)…but only the 2007 Jayhawk squad failed to reach the Final Four, and two champs (UConn in '04 and Kansas last year) are among the group. Perhaps more impressively, these five teams own a whopping +1.940 PASE.
Heck, if you just restricted your filtering to the attributes on the above list that all champs possess (above-median Pythag, winning percentage, scoring margin, points scored and coaching experience), you'd be left with 22 squads-about five per dance. These teams would account for 10 Final Fours, all the tourney crowns, and a hefty PASE of +.675. Wanna do better? Up the winning percentage to .800, points scored to 78 and scoring margin to 13 per game, then restrict your group to teams winning at least seven of their last 10. Only 15 squads in the last five years--three per tourney--have fulfilled these conditions. Nine of them (60 percent) have reached the Final Four, and every champ has been among them. Not only that, but the group overachieves by more than one game per tourney (+1.120 PASE).
Using Pythag to Pick the Last Five Brackets
The last and most basic analysis I did on tempo-free stats sprung out of the uncanny performance of Pomeroy's numbers in picking the 2008 bracket. If you filled out your bracket last year using Ken's "Pythag" rankings, you would've wound up in the 97th percentile of the ESPN Tourney Challenge and likely won your tourney pool.
Given such a sterling performance, I wondered how well Pythag had done in prior years, and how it might compare to more basic systems, like just picking the higher seeds. Even the most clueless bracket rookie can do that.
As I've already said, these last five years might not be representative of the entire 64-team era...but the supremacy of Pythag might just be an anomaly. Overall, filling out your bracket by Pythag rankings would've netted you an average of 42.4 correct picks out of 63 games. While you would've gotten 48 picks right last year, you would've just gotten 37 right in 2006. Pythag would've steered you to the ultimate champion just twice--last year with Kansas and in 2004 with UConn.
On the other hand, filling out a no-brainer bracket of all higher seeds (and picking the top seed with the higher scoring margin from the Final Four on), would've gotten you an average of one extra correct pick per tourney than Pythag. From 2004 to 2008, the "higher seed, higher margin" strategy would've resulted in an average of 43.4 correct picks, with a high of 49 in 2007 and a low of 38 in 2006. More importantly, it would've pointed you to three tourney champs: Kansas last year, Florida in 2007 and UConn in 2004. If you're keeping score, "no-brainer" bracket picking beat out Pythag picking in three of the last five dances.
Don't get me wrong: tempo-free stats are invaluable in assessing the relative performance of teams in the tourney. But this quick analysis confirmed for me what I'm sure Ken already knew: Pythag data, like any other bracket-picking strategy, isn't an infallible system for bringing sanity to March Madness.
That said, tempo-free stats, particularly Pythag winning percentage and defensive efficiency, are pivotal attributes to consider in identifying deep tourney advancers. They take on even greater importance when combined with other key performance indicators. After Selection Sunday, make sure you get hold of Ken's numbers and consult them as you fill out your bracket.
Pete Tiernan has been using stats to analyze March Madness for 19 years. His insights into the NCAA basketball tournament can help you build a better bracket. E-mail him here or visit bracketscience.com.