In the noble tradition of PECOTA and KUBIAK, Basketball Prospectus is proud to introduce SCHOENE. In the spirit of its predecessors, SCHOENE is named after Russ Schoene, a journeyman forward during the 1980s who most famously spent two seasons playing for the Seattle SuperSonics. (This, coming from a lifelong Sonics fan/ex-employee hoping to keep the team's tradition alive, is something less than a coincidence.) Like PECOTA, SCHOENE is technically a fancy-sounding acronym for Standardized Comparable Heuristic Optimizing Empirical NBA Evolution.
I began working on an NBA projection system not long after reading Nate Silver's eye-opening essay explaining PECOTA in Baseball Prospectus 2003. Naturally, NBA projections are more challenging for several reasons. Most notably, we have fewer years of history to work with (it would be nearly impossible to use any stats from before 1977-78, when the NBA began recording player turnovers; I actually start with 1979-80 so that all stats include games played with the three-point line). Also, while PECOTA can draw upon minor-league stats, it is nearly impossible to create a sizeable pool of comparable players for guys who enter the NBA at a very young age.
Still, many of PECOTA's most successful attributes can be recreated in basketball, a process that both John Hollinger and I have attempted. In 2004, I used my system to project records on the Sonics' Web site, an effort that can charitably be described as "subpar." In hindsight, 2004-05 was a difficult year to predict, in no small part because of the impact of the rules re-interpretations that played a role in the success of surprise teams like Phoenix and Seattle.
For nearly four years, I basically put that work away because I did not have time to make projections or improve the system. This year, I made that time and revamped the system. As will surely be clear as I roll out the projections over the course of this week and into next week, there is still room for improvement. Still, SCHOENE is unique amongst projection systems in its effort to project both players and teams and contextualize both sets of projections.
Naturally, the projection process begins with similarity scores. I determine player similarity using 13 factors, all standardized for league:
- "Shoot" rating (based on 3P%, 3PM/Min and FT%)
- Two-point percentage
- "Inside" rating (FTA-3PA)/Possessions
- Usage rate
- Rebound percentage
- Assist percentage
- Steal percentage
- Block percentage
- Turnover percentage
- Per-minute Win %
Like many similarity scores, mine are calculated out of 100, that being the closest. A score of 95 means two highly similar players, while 90 is reasonable similarity and anything below that starts to get dicey. The player I projected with the best comparable was Antoine Wright, whose 2007-08 season was (apparently) virtually identical to Rod Higgins' 1983-84 season, with a similarity score of 99.2.
The player with the lowest score for his best comp was Steve Novak, who just snuck over the 250-minute minimum and had the highest "Shoot" rating of anyone in NBA history. No one scored with a similarity to Novak of higher than 78.5. Novak was one of two guys who "broke the system," so to speak, Chris Paul being the other. While Paul has one decent comp (Isiah Thomas in 1983-84), no one else scored better than 80. In both cases, I decided to simply use the player's 2007-08 numbers.
In general, the goal is to use at least 50 comparable players. The easiest way to do this in Excel is to count comparables at different levels of similarity, starting at 95 and going down by one point until 90. As long as the group is over 20 players, I'd cut off at 90. If the group was smaller, I continued expanding the pool to 87.5, 85, 82.5 and finally 80, taking the group of whatever size at 80. Two players (Jamaal Magloire and Kenny Thomas) had a sample of just four players. They illustrate the difficulty in finding comps for extremely poor players. Unique players (like Steve Nash) generally have fewer comps, and the aforementioned very young and very old have smaller groups as well.
Based on the development of similar players, I project 2008-09 stats in 14 categories that can be used to create a stat line. The initial projection is pretty straightforward, relying on projected rates and team pace and other factors (projected generally the same as in 2007-08. The changes I made were to Dallas (faster), New York (much faster) and Phoenix (slower) because of coaching changes).
Because I project entire teams as well as players, SCHOENE takes team context into account in other areas. Most notable amongst these is usage rate. Because of player movement and other factors, some teams will naturally have combined usage rates above or below 100 percent. Naturally, this cannot happen in reality. So I reduce the usage rates of everyone on the team while making a corresponding boost to their two-point percentage, three-point percentage and turnover rate to account for the inverse relationship between usage and efficiency. Another adjustment is made to the defensive rebounding percentage of teams that differ dramatically from league average (more in the team section).
While these team adjustments don't necessarily affect the players' ratings, I do believe they produce more realistic stat lines, which is potentially useful for fantasy purposes. You can download a spreadsheet with our projected stats for virtually every player (the only exceptions being those who did not play at least 250 minutes last season, European imports and a couple of rookies whose numbers I don't have).
Like PECOTA, SCHOENE attempts to acknowledge the uncertainty inherent in projecting player performance. While I'm not yet able to generate the kind of graphs and forecasts Baseball Prospectus does, I have recreated the familiar Improve/Breakout/Decline percentages, a breakout or decline (which I term "Coppage") being defined as at least 20 percent improvement or drop-off.
Magloire, who has plenty of room for improvement after a dismal 2007-08 season, scored the highest improvement percentage, albeit amongst the aforementioned four comparable players. Adam Morrison (86 percent) is second over a more legitimate sample as well as tops with a 54 percent chase of breaking out. In both cases, we're seeing what Silver termed "The Ugueto Effect" after the light-hitting Mariners utilityman who showed a high likelihood of improving and breaking out the season PECOTA was introduced because he was previously so bad. Brent Barry (10 percent) has the lowest chance of improving, with Bobby Jackson (26 percent) having the most comps who declined dramatically the following season. Magloire was also the player with the largest improvement amongst comparable players (+24.4 percent), while Shaquille O'Neal (-11.9 percent) saw the biggest projected decline.
Naturally, rookies cannot be projected in the same way, so I used their projected stats based on the NCAA translation I introduced prior to the NBA Draft.
To the extent that player stats have been used to project team performance, it has almost always been on the basis of some value measure (PER, in Hollinger's case, while I used WARP in 2004-05) aggregated at the team level. Unfortunately, this kind of logic wouldn't hold even if our projections were perfect because any stat without a full team defense adjustment will not at the team level equate to the team's record in a given season. That is, teams with good defenses tend to have their record understated by their PER and WARP and vice versa. So projections based on player value will tend to give defense second billing.
SCHOENE attempts to go further. On the offensive end, the results are generally the same as aggregating, in that team performance is based on the combined performance of individuals (which already takes their role in the projected offense into account). On defense, defensive rebounding, blocks, steals and personal fouls are projected from individual statistics. Two-point percentage on unblocked shots and non-steal turnovers (as well as other descriptive factors like ratio of three-point attempts to twos) are based on past team performance regressed to league average (by a factor of 25 percent for two-point shots and about 45 percent for non-steal turnovers, which also factor in projected steal rate).
Now, in part the reason nobody has projected team statistics the way I have (to my knowledge) is that it is hard. It is difficult to model the interaction effects we know exist in real life. That's the reason for the defensive rebound adjustment I mentioned above. When I initially ran the numbers, the L.A. Clippers projected to grab 85.7 percent of all available defensive rebounds. This would, um, be a record; the highest defensive rebound percentage in the league last year was 77.1 percent by San Antonio.
No, the Clippers likely will not shatter that mark. (In fact, they don't project as the best defensive rebounding team in the league, Miami coming up later at an insane 89.1 percent.) The problem is that there's diminishing returns when it comes to defensive rebounding. Some of the rebounds grabbed by newcomer Marcus Camby will be taken away from Chris Kaman, for example. This must happen; the likely Clippers starting five (Davis/Mobley/Thornton/Camby/Kaman) projects to a combined 95.2 percent of defensive rebounds, which is nearly physically impossible. It also can be shown empirically. Studies have shown that the actual performance on the defensive glass of a lineup is much closer to average than their individual numbers added together would predict. So I regressed defensive rebound percentage to league average by a factor of two-thirds.
Part of the rebounding issue that is more problematic is the issue of positions. For the most part, positions aren't a major factor, but there is some evidence to suggest that players rebound differently depending on their position. A good example is the Heat's Marion, who will likely play far more small forward than he did previously in Phoenix. It's unlikely that Marion will continue to rebound at the same rate as he did last season. However, I'm unsure how to adjust for this effect.
There may also be diminishing returns elsewhere. The Clippers again present an outlier in terms of shot-blocking by pairing Camby and Kaman. They project to block shots at a far greater rate than any NBA team did last season. Because only one will be in the position of serving as primary help defender at any given time, both will likely see their shot blocking diminish this season.
Another major factor that is not captured in this analysis is the value of assists and passing. I'd hope to find some way to account for passing in this future, but at this point an offense is the sum of the individual players' performance adjusted for usage.
Despite these caveats, I feel good about the team projections and their usefulness. Naturally, there are places where my subjective assessment based upon the stats does not agree with the purely objective method. Still, I think there are some valuable and interesting insights to be gleaned from these numbers, as we'll look through over the next week.
To create the team projections, I projected games played based on a study done by Ed Kupfer, who found that the expectation for a player started at 76 games and went down about one game for each six missed the previous season and one for each 20 missed two years ago. There is a subjective element, and I've adjusted up for players who missed the entire 2007-08 campaign as well as taking into account new injuries. Minutes is strictly a subjective assessment. To fill in the gaps where not everyone has been projected, I've used generic replacement players (estimated at 80 percent of league average in every category).
That's probably more than you wanted to know about the process. Now we can get to the results and analysis, starting Tuesday with the Central Division.
Kevin Pelton is an author of Basketball Prospectus.
You can contact Kevin by clicking here or click here to see Kevin's other articles.